This is my course page for the R exercises in the Spatial Analysis and Data Exploration in History and Archaeology course, University of Helsinki, lectured by Eljas Oksanen in the Spring of 2021.

How to create a course page:

You need to have R, RStudio and Git installed and Git linked to your Rstudio before this step.

  1. Create a new GitHub repository

  2. Open RStudio and create a new project by selecting: “File” - “New Project” - “Version Control” - “Git” and paste the web URL in the “Repository URL” box.

  3. Create a web page on GitHub:

  1. Go back to the GitHub repository.

  2. Open “Settings” and scroll to “GitHub Pages”. From “Source” select “master branch”.

  3. Now your course diary web page is online at github_username.github.io/repository_name

1: Introduction to R and SADE

In importing the .csv data to R there was a problem with importing European style .csv in which data is separated with ; instead of comma, and decimals are displayed with commas instead of dots. This was resolved by adding [sep=“;”, dec=“,”] when importing the data. I did also need to define the margin size for plots in order to display the results, but otherwise the exercise didn’t pose much difficulty.

Importing and exploring data

# I tested if the file is readable and if my path is correct:
file.exists("~/R/SADE/week1/axes.csv")
## [1] TRUE
# Importing the data
setwd("~/R/SADE/week1")
axes <- read.csv(file="axes.csv", header=TRUE, sep=";", dec=",")
class(axes)
## [1] "data.frame"
head(axes)
##   km_number find_type length width thickness weight       x      y
## 1     32039   kirveet  120.0  40.0        40  420.0 6987601 559225
## 2     33070   kirveet   91.0  50.0        25  331.0 7204072 587496
## 3     35902   kirveet  197.0  75.0        37  620.0 7638696 492759
## 4     39149   kirveet  193.0  82.0        45 1016.0 6941672 467565
## 5     39178   kirveet  174.0  41.0        84  759.8 6802153 328646
## 6     39452   kirveet   99.5  53.5        30  248.1 6748437 371160
summary(axes)
##    km_number      find_type             length          width       
##  Min.   :32039   Length:16          Min.   : 55.0   Min.   : 40.00  
##  1st Qu.:39171   Class :character   1st Qu.:110.4   1st Qu.: 54.62  
##  Median :39599   Mode  :character   Median :143.0   Median : 70.50  
##  Mean   :38635                      Mean   :136.8   Mean   : 75.97  
##  3rd Qu.:40106                      3rd Qu.:163.5   3rd Qu.: 92.25  
##  Max.   :40334                      Max.   :197.0   Max.   :151.00  
##                                                                     
##    thickness      weight             x                 y         
##  Min.   : 9   Min.   :  88.1   Min.   :6705920   Min.   :328646  
##  1st Qu.:29   1st Qu.: 317.5   1st Qu.:6746623   1st Qu.:364874  
##  Median :37   Median : 466.9   Median :6872906   Median :469289  
##  Mean   :36   Mean   : 501.4   Mean   :6969577   Mean   :468313  
##  3rd Qu.:40   3rd Qu.: 652.5   3rd Qu.:7201614   3rd Qu.:577979  
##  Max.   :84   Max.   :1016.0   Max.   :7638696   Max.   :587496  
##  NA's   :3
summary(axes$weight)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    88.1   317.5   466.9   501.4   652.5  1016.0
summary(axes$length)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    55.0   110.4   143.0   136.8   163.5   197.0
sd(axes$weight)
## [1] 277.1391
sd(axes$thickness, na.rm=T)
## [1] 17.46425
# Plotting length vs weight
par(mai=c(1,1,1,1))
plot(x=axes$length, y=axes$weight, col="black", pch=16, main="Axe heads recovered by members of the public")

# Boxplot: Axe length and width
boxplot(axes$length, axes$width, ylab="millimeters")

# Barplot: Axe weight, no sorting
barplot(axes$weight, ylab="grams")

# Barplot: Axe weight, sorted
barplot(sort(axes$weight), ylab="grams")

# Histogram: Axe weight, default
hist(axes$weight)

# Histogram: Axe weight, breaks defined
hist(axes$weight, breaks=seq(0, 1200, 100))

# Histogram: Combination 
# Note: This works in R, but not in the knitted html page
dev.new(device=pdf, height=6, width=12)
par(mfrow=c(1,2), mai=c(1,1,1,1))
barplot(sort(axes$weight), ylab="grams", names.arg = c(axes$km_number), las=3)
hist(axes$weight)

Plotting maps and spatial data

# Necessary libraries + Finland ESRI Shapefile
library(rgdal)
library(raster)

fin <- readOGR(dsn="finland", layer="finland_valtakunta_simpler")
## OGR data source with driver: ESRI Shapefile 
## Source: "C:\Users\Uine\Documents\R\SADE\week1\finland", layer: "finland_valtakunta_simpler"
## with 1 features
## It has 4 fields
## Integer64 fields read as strings:  GML_ID
# Summary of Finland ESRI Shapefile
summary(fin)
## Object of class SpatialPolygonsDataFrame
## Coordinates:
##          min       max
## x   83725.35  732881.9
## y 6637635.87 7776440.6
## Is projected: TRUE 
## proj4string :
## [+proj=utm +zone=35 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m
## +no_defs]
## Data attributes:
##     GML_ID            NATCODE            NAMEFIN            NAMESWE         
##  Length:1           Length:1           Length:1           Length:1          
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character
# Plotting the data with monuments from .csv file
plot(fin, col="lightgrey")
monu <- read.csv(file="monuments_fha.csv", header=TRUE, sep=",")
coordinates(monu) <- ~X+Y
crs(monu) <- crs(proj4string(fin))
points(monu, pch=19, cex=0.1)
points(monu[monu$period == "stone age", ], pch=19, cex=0.1, col="red")

2: Point Pattern Analysis

Point Densities

# Importing the data
# Info about readOGR function: ?readOGR
library(rgdal)
library(raster)
library(spatstat)
library(maptools)

setwd("~/R/SADE/week2")
polyg <- readOGR(dsn="englandwales", layer="engwales_simple")
## OGR data source with driver: ESRI Shapefile 
## Source: "C:\Users\Uine\Documents\R\SADE\week2\englandwales", layer: "engwales_simple"
## with 1 features
## It has 1 fields
# Digital Elevation Model (DEM) taken from NASA’s Shuttle Radar Topography Mission data
dem <- raster("dem/engwales_dem.tif")
plot(polyg)
plot(dem, add=T)

# Change the color scheme to more intuitive one
plot(dem, add=T, col=terrain.colors(5))

# Loading medieval market sites from Samantha Letters’s (2002) Gazetteer of Markets and Fairs in England and Wales to 1516.
markets <- readOGR(dsn="markets1334", layer="markets1334")
## OGR data source with driver: ESRI Shapefile 
## Source: "C:\Users\Uine\Documents\R\SADE\week2\markets1334", layer: "markets1334"
## with 2293 features
## It has 7 fields
points(markets, pch=19, cex=0.2)

class(markets)
## [1] "SpatialPointsDataFrame"
## attr(,"package")
## [1] "sp"
head(markets, n=10)
##     ID     MODNAME         COUNTY DEFYR BOROUGH VAL_1334 by1200
## 1   28  CANTERBURY           KENT   600       1     0.00      1
## 2    7      LONDON      MIDDLESEX   650       1 11000.00      1
## 3   34        YORK      YORKSHIRE   700       1  1620.00      1
## 4   32 SOUTHAMPTON      HAMPSHIRE   750       1   511.17      1
## 5   43   ROCHESTER           KENT   821       1     0.00      1
## 6   24   WORCESTER WORCESTERSHIRE   899       1   300.00      1
## 7    8 WALLINGFORD      BERKSHIRE   900       1    96.24      1
## 8   57      EXETER          DEVON   900       1   366.17      1
## 9  119     HALWELL          DEVON   900       1    13.50      1
## 10  29     LYDFORD          DEVON   900       1    11.67      1
# Manipulating the data: 

# Lets make sure that the two have the same coordinate reference system (CRS) by taking the polygon’s CRS and applying it to the markets.
markets <- spTransform(markets, CRS(proj4string(polyg)))

# Subsetting the markets within the polygon's area to drop out those that fall outside of its borders
markets <- markets[polyg, ]

# Markets are events rather than places - let's save the data that contains duplicates as "market_events" and then remove duplicates from "markets"
market_events <- markets
markets <- remove.duplicates(markets)

# Creating spastat object from markets
sp_markets <- as.ppp(coordinates(markets), as.owin(polyg))

# Kernel Density Estimate (KDE)
dens <- density(sp_markets, sigma=10000, edge=TRUE, eps=500)
plot(dens)

# Computation to determine an “optimal” bandwidth (sigma)
bw.diggle(sp_markets)
##   sigma 
## 18971.5
# Adjusting color scheme and adding markets as points
plot(dens, col=heat.colors(10))
points(markets, pch=19, cex=0.2)

# Investigating data for attributes that might have overtly strong impact 
head(markets, n=20)
##     ID      MODNAME         COUNTY DEFYR BOROUGH VAL_1334 by1200
## 1   28   CANTERBURY           KENT   600       1     0.00      1
## 2    7       LONDON      MIDDLESEX   650       1 11000.00      1
## 3   34         YORK      YORKSHIRE   700       1  1620.00      1
## 4   32  SOUTHAMPTON      HAMPSHIRE   750       1   511.17      1
## 5   43    ROCHESTER           KENT   821       1     0.00      1
## 6   24    WORCESTER WORCESTERSHIRE   899       1   300.00      1
## 7    8  WALLINGFORD      BERKSHIRE   900       1    96.24      1
## 8   57       EXETER          DEVON   900       1   366.17      1
## 9  119      HALWELL          DEVON   900       1    13.50      1
## 10  29      LYDFORD          DEVON   900       1    11.67      1
## 11  72       PILTON          DEVON   900       1    10.00      1
## 12  46     BRIDPORT         DORSET   900       1    99.71      1
## 13  12  SHAFTESBURY         DORSET   900       1   200.00      1
## 14  50      WAREHAM         DORSET   900       1    63.50      1
## 15  17 CHRISTCHURCH      HAMPSHIRE   900       1    39.75      1
## 16  37  PORTCHESTER      HAMPSHIRE   900       1   119.01      1
## 17  16   WINCHESTER      HAMPSHIRE   900       1   515.17      1
## 18  33     AXBRIDGE       SOMERSET   900       1    45.00      1
## 19  36         BATH       SOMERSET   900       1   133.33      1
## 20  92     LANGPORT       SOMERSET   900       1    40.00      1
# We find out that mean value has very big spread (up to £11,000)
summary(markets$VAL_1334)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
##    -1.00     0.00    40.00    70.86    83.92 11000.00
# The wealth of London is statistical outlier and hides the weaker patterns in comparison
boxplot(markets$VAL_1334)

# boxplot.stats gives us the lower whisker or minimum, the interquartile range of the middle (the second and fourth values), the median (third value) and the upper whisker or maximum. 
boxplot.stats(markets$VAL_1334)
## $stats
## [1]  -1.000   0.000  40.000  83.915 205.000
## 
## $n
## [1] 1947
## 
## $conf
## [1] 36.99521 43.00479
## 
## $out
##  [1] 11000.00  1620.00   511.17   300.00   366.17   515.17   220.00   800.00
##  [9]   300.00   261.38   358.83   370.50   540.75  1000.00   946.00   913.92
## [17]   604.54   265.44   270.00   466.09   645.25   292.50   227.36   382.88
## [25]  2200.00   280.50   360.00   293.06  1000.00   249.06   349.00   250.00
## [33]   242.50   266.67   210.00   245.75   600.30   390.12   213.19   270.00
## [41]   226.52   244.12   211.62   500.00   268.97   750.00   266.63   255.00
## [49]   333.33   240.52   255.00   262.30   266.63   330.00   409.50   260.00
## [57]   270.00   232.50   232.88   412.13   235.00   300.00   315.00   341.30
## [65]   750.08   398.15   357.61   232.50   221.25   285.00   270.00  1100.00
## [73]   211.04   210.00   495.00   375.00   250.00   225.00   210.00   480.00
## [81]   228.81   257.16   232.50   247.50   532.50   225.00   333.33   450.00
## [89]   450.00   225.00   400.65   315.00   240.00   220.75   247.42   412.50
## [97]   675.00
# Subset to get rid of the high outliers to illustrate "weaker" patterns, original data is saved as markets_backup
markets_backup <- markets

markets_major <- subset(markets, VAL_1334 > 205)
markets <- subset(markets, VAL_1334 < 205)
sp_markets <- as.ppp(coordinates(markets),as.owin(polyg))

# Recalclulating KDE
dens_weighted <- density(sp_markets, sigma=10000,
weights=markets$VAL_1334, edge=TRUE, eps=500)
plot(dens_weighted, col=topo.colors(50))
points(markets, pch=19, cex=0.1)
points(markets_major, pch=15, cex=0.7, col="red")

Relative risk surface

# Setting up
markets <- subset(markets_backup, DEFYR<1251)
sp_markets <- as.ppp(coordinates(markets),as.owin(polyg))

# Subsetting data
earlier <- as.ppp(coordinates(markets[markets$by1200=="1",]),as.owin(polyg))
later <- as.ppp(coordinates(markets[markets$by1200=="0",]),as.owin(polyg))
multit <- sp_markets
marks(multit) <- as.factor(markets$by1200)

# Plotting the new multitype object
plot(multit)

# Density estimate of all the markets by 1250
dens1250 <- density(sp_markets, sigma=25000, edge=TRUE,
eps=500)
plot(dens1250, col=topo.colors(50))
points(markets, pch=19, cex=0.2)

# Creating relative risk surface that contrasts earlier markets to those founded later

rrs <- relrisk(multit, sigma=25000, edge=TRUE, eps=500)
rrs[as.matrix(dens1250)<(0.000000001)] <- NA
dev.new(device=pdf, height=6, width=6)
plot(polyg)
plot(rrs, col=rev(topo.colors(50)), add=T)
plot(multit, cex=0.3, add=T)

# Plotting side by side still doesn't work in the webpage version... Might need to find other solutions for that!

Nearest Neighbour

# Distances between market sites and their closest neighbours
nndist(sp_markets)
##   [1]  3560.8988   761.5773 11681.1814 12067.3112  6325.3458  2701.8512
##   [7]   806.2258 10022.4747  5658.6217 10673.7997   806.2258 13928.3883
##  [13] 11092.3397  6685.0580 12567.4182  3275.6679 11360.0176  2690.7248
##  [19]  8237.7181  7060.4532  8202.4387  7723.3412   761.5773  2920.6164
##  [25]  5885.5756  9762.1719 10478.5495  2002.4984  7021.3959  7140.0280
##  [31]  4712.7487 11720.0683  1131.3708  3807.8866  9100.5494  5787.0545
##  [37]  9420.1911  7125.3070 11562.4392  8417.2442  9501.5788  9264.9879
##  [43]  7416.8727 12465.1514 10218.1212  9425.4973  3605.5513 14712.2398
##  [49] 11260.9946 14090.0674  7864.4771  3492.8498 15781.3181  9167.8787
##  [55] 10791.2001  4410.2154  9025.5194  7186.7934 14327.9447  7102.1124
##  [61] 12745.5875 10332.4731  6224.1465  6573.4314   509.9020   509.9020
##  [67]   806.2258  3757.6588 10720.0746  4494.4410 12149.8971  9827.0036
##  [73] 10107.9177  8134.4945  1220.6556  7723.3412   943.3981  3244.9961
##  [79]  7467.2619  4632.4939  4775.9816 10176.9347  2915.4759  5557.8773
##  [85]  4904.0799  6854.1958  3966.1064 14489.9965  7496.6659  1878.8294
##  [91] 17084.7886  4410.2154  3744.3290  8772.6849  4936.5980  9004.9986
##  [97]  3640.0549  5408.3269  7605.9187  5366.5631  4967.8969  6888.3960
## [103] 22274.8737  9372.2996  2121.3203  5124.4512  4441.8465  6835.9345
## [109]  4472.1360  3189.0437  7061.1614  7655.7168 11315.9180  3944.6166
## [115]  3613.8622  6627.2166  4527.6926  9104.9437  6989.2775  6824.9542
## [121]  9712.8780 10381.7147  3448.1879  3512.8336  7831.3473  8130.1906
## [127]  6407.8077 10978.6156  3701.3511  1649.2423  3560.8988  6135.1447
## [133] 10252.8045  8198.7804  2236.0680  8099.3827  7657.6759 11101.8017
## [139]  4110.9610  6977.1054  6977.1054 14259.3829  3080.5844  3534.1194
## [145]  6859.3003  3534.1194  5964.0590  2816.0256  9272.0009 11800.4237
## [151]  7360.0272  2765.8633  6774.2158 10507.6163  7134.4236  8628.4413
## [157]  4393.1765  9687.6210  4967.8969  9700.0000  6264.9820  4272.0019
## [163]  2920.6164   943.3981  3584.6897  4327.8170  8415.4620  2002.4984
## [169] 12298.3739 11303.0969  6200.8064  5869.4122  7496.6659  2683.2816
## [175]  6529.9311 13000.0000  8249.2424 21226.8698 14534.4419 16949.6313
## [181] 14933.5193  2765.8633  4188.0783  5412.0237 10673.7997  6754.2579
## [187] 12067.3112  6573.4314 11884.8643  6835.9345 14222.5174  2716.6155
## [193]  7071.0678  7003.5705  5536.2442 11500.4348  7125.3070  9305.9121
## [199]  6768.3085  4916.2994  7580.2375  8490.5830 21497.9069  5420.3321
## [205]   728.0110 17415.5103  4600.0000  4707.4409  9904.5444  4272.0019
## [211] 11503.9124  3757.6588  6609.8411  1104.5361  5249.7619  9642.0952
## [217]  6603.7868  4775.9816  4775.9816  2408.3189   360.5551  9848.8578
## [223]  5161.3952 15305.2279  4300.0000  4632.4939  2900.0000  7071.0678
## [229]  8102.4688  6365.5322  5015.9745  9660.7453  5437.8304  9774.4565
## [235] 10495.7134  4161.7304  5770.6152   806.2258 11088.7330  6107.3726
## [241] 10785.6386  9394.1471  4494.4410  5656.8542  9035.4856  3006.6593
## [247] 10186.7561  5724.5087  7343.0239  6053.9243  6664.0828  5408.3269
## [253] 19780.0404  2915.4759  9386.1600  9702.0616  7782.0306  6529.9311
## [259] 11798.7287  2193.1712  5905.9292 10178.8997  2236.0680  4909.1751
## [265] 12523.9770  8854.3774  6964.1941 11500.4348  7337.5745  8731.5520
## [271]  3901.2818  6390.6181  2701.8512  4472.1360 16609.0337 14408.6779
## [277]  2961.4186 12520.7827 11884.8643  6242.5956  6977.8220 28142.6722
## [283] 15061.5404 11810.5885  3138.4710 18694.6516  2720.2941  7496.6659
## [289]  8246.2113  2408.3189  6325.3458  4785.3944  4770.7442  7580.2375
## [295]  2200.0000 16865.6456  5403.7024 14013.2081 10609.9010  9786.2148
## [301] 11961.6052 13914.0217 11403.9467  9848.8578  7655.7168  4272.0019
## [307] 11112.6055 12998.8461 22588.7140  7653.1039 12523.9770  6390.6181
## [313]  7375.6356  8202.4387  7134.4236 13805.7959  5643.5804  4580.3930
## [319]  9700.0000 10541.8215  3178.0497  3178.0497  5099.0195 11363.5382
## [325]  3701.3511  8127.1151  8357.0330  1878.8294  8381.5273  5142.9563
## [331]  2941.0882 18304.3711  2220.3603  6768.3085  2927.4562  7849.2038
## [337]  8183.5200  2662.7054 12870.1204  7343.7048  3436.5681  7140.0280
## [343]  3014.9627 11360.0176 11335.7840 10044.8992  5162.3638  6965.6299
## [349] 10609.9010  2435.1591  9209.7774  2927.4562  3612.4784  7905.6942
## [355] 14616.7712  3712.1422  4560.7017  7580.2375  3584.6897 13905.3946
## [361] 16949.6313  7725.2832 41618.3854  8324.0615  7273.2386  4327.8170
## [367]  6841.0526  8637.1292 12316.6554  3482.8150  8102.4688  3640.0549
## [373]  3613.8622  5458.9376   316.2278 11637.8692  1220.6556  2884.4410
## [379]  3569.3137  3080.5844  5124.4512  9135.0972 12093.3866  5787.0545
## [385]  5608.0300 10623.0881  3671.5120 15970.5980  2729.4688  1523.1546
## [391]  1523.1546 12851.4591 10978.6156  8183.5200  3940.8121 12349.0890
## [397]  7200.6944  4596.7380  6519.2024 14302.4473  8402.3806  8876.9364
## [403] 10920.1648  5280.1515 11800.4237  1615.5494  7849.2038  3275.6679
## [409]  4565.0849  4701.0637  6627.2166  6440.4969  3436.5681  6551.3357
## [415]  2941.0882  8489.9941 11313.7085  7720.1036  8345.0584  8292.7679
## [421]  1104.5361  6135.1447 10978.6156  2549.5098  2549.5098  9272.0009
## [427]  3492.8498  2683.2816  5015.9745  6860.0292  1941.6488  5813.7767
## [433]  5675.3854  2039.6078 20113.6769 17083.6179 15015.3255  7273.2386
## [439]  5590.1699 10867.3824 10347.9467 15970.5980  9775.4795  7000.0000
## [445] 10728.0007  6177.3781  7988.1162  9802.0406  6041.5230 13579.3962
## [451]  7060.4532  8854.3774 21384.5739  1131.3708  3512.8336 11205.8021
## [457]  6087.6925  6718.6308  7203.4714  2780.2878  7715.5687  5981.6386
## [463]  7209.0221  1581.1388 15781.3181 10466.1359  6685.0580  7576.9387
## [469]  7359.3478  6200.8064  8527.6022  6664.0828  8130.1906   728.0110
## [475]  4964.8766  7615.7731  7102.1124  3828.8379  5830.9519  5053.7115
## [481]  1615.5494  4909.1751 10016.9856  2961.4186  4441.8465  6791.1707
## [487]  7810.2497  4527.6926  6389.0531 12126.4174  1581.1388  1581.1388
## [493]  3569.3137  5408.3269 10400.4808  5841.2327  5984.1457 10245.9748
## [499]  2816.0256  1900.0000  3546.8296  5885.5756  2720.2941  6824.9542
## [505]  6519.2024  9300.5376  2376.9729  7839.0050  4301.1626  5508.1757
## [511]  5420.3321 13787.6757  3448.1879 10207.8401  5412.0237  2039.6078
## [517]  4743.4165 12103.7184  8640.0231  8099.3827  5456.1891  4104.8752
## [523]  4104.8752  7203.4714  3712.1422  7337.5745  1923.5384  5408.3269
## [529] 10241.5819  8895.5045  9617.6920  5000.0000  5215.3619  2561.2497
## [535] 10283.9681 10840.6642  5675.3854  6791.1707   806.2258  7690.2536
## [541]  4429.4469  8538.1497  3228.0025  3244.9961 10937.0928  4743.4165
## [547] 11360.0176  9482.6157  8998.8888  9386.1600 11101.8017  5590.1699
## [553]  7420.2426  2561.2497  2121.3203  5714.0179  7022.8199  3757.6588
## [559]  5536.2442  6801.4704 15652.4758 14241.1376 18298.9071  5661.2719
## [565]  4964.8766  7782.0306  9489.4678  2039.6078  5408.3269 11822.4363
## [571]  9712.8780  9167.8787  3000.0000  6365.5322  4785.3944  4617.3586
## [577]  8747.5711  7378.3467  8185.9636  5220.1533  6685.0580  8188.4064
## [583]  7022.8199  6403.1242 10793.5166  5688.5851 11415.7786   360.5551
## [589]  5714.0179 13173.4582  3689.1733 11381.1247  6942.6220  7217.3402
## [595]  2662.7054  7368.1748  7580.2375  6609.8411  3228.0025 10183.3197
## [601]  5280.1515  5124.4512  6618.9123  3676.9553  3860.0518  7702.5970
## [607]  8402.3806  3465.5447  4300.0000  7702.5970  3546.8296  4763.4021
## [613] 11415.7786 15911.3167  8122.1918 13914.0217 11303.0969 16804.7612
## [619] 15365.2205  7690.2536  8792.0419  7375.6356 10704.2048 11317.6853
## [625]  6841.0526  3448.1879  1581.1388  5930.4300  3982.4616  5508.1757
## [631] 10336.8274  6146.5437  2900.0000  7119.6910  7470.6091  4712.7487
## [637]  5824.0879  5608.0300 10245.9748  4560.7017  3275.6679  3275.6679
## [643]  6185.4668  7496.6659  9386.1600  6403.1242   500.0000  1923.5384
## [649]  9035.4856  4429.4469  6573.4314 12567.4182  9025.5194  3360.0595
## [655]  4924.4289  6107.3726 10212.2475 10555.0936  2941.0882  3569.3137
## [661]  2973.2137  7470.6091  7800.6410  6800.0000  4161.7304 10131.6336
## [667]  6596.9690  6920.2601  2061.5528  9276.3139  7200.6944 10430.7238
## [673]  6888.3960  8080.8415  6236.9865  3138.4710  6041.5230  4936.5980
## [679]  5948.1089  2563.2011  1649.2423  2563.2011  2729.4688  2884.4410
## [685]  5807.7534  5333.8541 11562.4392  7340.9809  9104.9437  5981.6386
## [691]  6306.3460  4272.0019  6920.2601  6573.4314 13978.9127  2720.2941
## [697] 11063.4533 12949.5174  5360.0373 11205.8021 19865.5481  5021.9518
## [703]  3275.6679  3901.2818 35913.5072  5142.9563  7420.2426  8628.4413
## [709]  3448.1879  5590.1699  7596.0516  2061.5528  3828.8379  4825.9714
## [715]  1252.9964  9420.1911  2376.9729  5700.8771  5053.7115  1252.9964
## [721] 11173.1822  5643.5804  8237.7181  8134.4945  5326.3496  6842.5142
## [727]  5246.9038  5824.0879  7831.3473  6802.9405  3689.1733  4580.3930
## [733]  7715.5687  5326.3496  7430.3432  4775.9816  3757.6588  7368.1748
## [739]  7494.6648  2039.6078  6640.7831  5458.9376 16275.4416  6000.0000
## [745]  8570.8809  9424.4363  9353.6089  4301.1626 11101.8017   500.0000
## [751]  3275.6679  4186.8843  6854.1958  5557.8773  5300.9433 13822.4455
## [757] 11086.0272 14241.1376  8122.1918   316.2278  4976.9469  9037.6988
## [763]  9213.0342  3982.4616  3257.2995  8421.9950  3257.2995  1513.2746
## [769]  6706.7131  9425.4973 10751.7440  3671.5120 10183.3197  5408.3269
## [775]  6989.2775  3014.9627  5948.1089  6965.6299  2780.2878  3080.5844
## [781]  4850.7731  7340.9809  5787.0545  3966.1064  9533.6247  6242.5956
## [787]  1941.6488  5161.3952  3023.2433  5916.9249  5021.9518  6389.0531
## [793]  5408.3269  9811.2181  9757.5612  5220.1533  6905.0706  8141.2530
## [799]  7119.6910  9771.8985  6264.9820  5787.0545 10867.3824  5000.0000
## [805]  4565.0849 14489.9965  4294.1821 11652.0384  4850.7731  1900.0000
## [811]  3577.7088  3482.8150  5162.3638  6276.9419  7627.5815 10589.1454
## [817] 10530.9069  6685.0580  3944.6166  7061.1614  2823.1188  7209.0221
## [823]  2720.2941 13225.7325  8236.5041  2435.1591  7343.0239  4712.7487
## [829]  7864.4771  8772.6849  8998.8888  7430.3432  6529.9311  7988.1162
## [835]  2823.1188   806.2258  4527.6926  4976.9469  3006.6593  9761.6597
## [841]  4770.7442 11926.8604  6224.1465  7420.2426  6964.1941  8598.2556
## [847]  5099.0195 11264.1023  4382.9214  3744.3290  1513.2746  4596.7380
## [853] 13905.3946  8631.3383  6946.2220  6000.0000  9104.9437 11573.2450
## [859]  3828.8379  2941.0882  5333.8541  3080.5844  7905.6942  3577.7088
## [865]  3605.5513  6802.9405  3676.9553  6888.3960  2193.1712  9633.7947
## [871]  2927.4562  8229.8238 12480.7852 14449.9135 25784.1036  4904.0799
## [877]  4904.0799  4924.4289  6029.9254  3023.2433  3006.6593 21384.5739
## [883]  9353.6089
# Visualizing the data and taking the mean
hist(nndist(sp_markets), breaks=seq(0, 50000, 1000))

mean(nndist(sp_markets))
## [1] 7152.382
# Clark and Evans test: 
# R value of 1.08 tells the sites are slightly dispersed, and p << 0.05 tells us that the result is statistically significant
clarkevans.test(sp_markets, corrections="none")
## 
##  Clark-Evans test
##  No edge correction
##  Z-test
## 
## data:  sp_markets
## R = 1.0819, p-value = 3.206e-06
## alternative hypothesis: two-sided
# Calling back the "markets as events" data from earlier, creating a subset of those that were recorded before 1250
# R value of 0.96 tells us that the data is slightly clustered
market_events1250 <- subset(market_events, DEFYR<1251)
sp_market_events <-
as.ppp(coordinates(market_events1250),as.owin(polyg))
hist(nndist(sp_market_events), breaks=seq(0, 50000, 1000))

mean(nndist(sp_market_events))
## [1] 6117.172
clarkevans.test(sp_market_events, corrections="none")
## 
##  Clark-Evans test
##  No edge correction
##  Z-test
## 
## data:  sp_market_events
## R = 0.96332, p-value = 0.02997
## alternative hypothesis: two-sided
# This final code is not needed in RStudio or the GitHub webpage
# save.image("week2_sade.RData")

3: Point Pattern Analysis & Regression Analysis

Poisson Distribution & Complete Spatial Randomness

# Setting up
setwd("~/R/SADE/week3")
library(rgdal)
library(raster)
library(spatstat)
library(maptools)
library(GISTools)

# Creating example data for Complete Spatial Randomness (CRS)
# Every time "plot" command is run, the pattern changes to a newly generated random distribution
window <- owin(c(0,10000), c(0,10000))
plot(runifpoint(n=1000, win=window))

# Setting seed allows everyone to see the same pattern 
set.seed(1)
plot(runifpoint(n=1000, win=window))

# Loading the study area (a 10 x 10 grid)
grid <- readOGR(dsn="grid", layer="grid")
## OGR data source with driver: ESRI Shapefile 
## Source: "C:\Users\Uine\Documents\R\SADE\week3\grid", layer: "grid"
## with 100 features
## It has 1 fields
plot(grid)

# Plotting the earlier random distribution to this grid illustrates that there is varying amounts of points in each cell (Poisson distribution)
set.seed(1)
randomp <- as.SpatialPoints.ppp(runifpoint(n=1000, win=grid))
crs(randomp) <- crs(proj4string(grid))
points(randomp, cex=0.5)

# Calculating the number of points in each cell and mean value of points per cell, and plotting histogram to illustrate the distribution 
hist(poly.counts(randomp,grid), xlim=c(0, 25))

mean(poly.counts(randomp,grid))
## [1] 10
# Creating a curve for idealised distribution for mean = 10
plot(dpois(x=0:20, lambda=10))
lines(dpois(x=0:20, lambda=10), col="blue")

Nearest Neighbour Revisited

# Loading the exercise data: a map depicting "nucleated" settlements (e.g. villages and towns)
polyg <- readOGR(dsn="england", layer="england_historic")
## OGR data source with driver: ESRI Shapefile 
## Source: "C:\Users\Uine\Documents\R\SADE\week3\england", layer: "england_historic"
## with 1 features
## It has 3 fields
dem <- raster("dem_england/dem_england_historic.tif")
settl <- readOGR(dsn="nucleations", layer="Nucleations")
## OGR data source with driver: ESRI Shapefile 
## Source: "C:\Users\Uine\Documents\R\SADE\week3\nucleations", layer: "Nucleations"
## with 10513 features
## It has 3 fields
plot(polyg)
plot(dem, add=T, col=terrain.colors(10))
points(settl, pch=19, cex=0.1)

# Omitting points that fall outside of the polygon and converting the data into spastat object
settl <- settl[polyg, ]
sp_settl <- as.ppp(coordinates(settl),as.owin(polyg))

# Histogram of distribution of distances and mean value
hist(nndist(sp_settl), xlim=c(0,6000), breaks=100)

mean(nndist(sp_settl))
## [1] 2059.87
# Clark and Evans test
# R > 1 (and p << 0.05) means that the sites are dispersed
clarkevans.test(sp_settl, corrections="all")
## 
##  Clark-Evans test
##  No edge correction
##  Z-test
## 
## data:  sp_settl
## R = 1.1709, p-value < 2.2e-16
## alternative hypothesis: two-sided

K-function, L-function and Pair Correlation Function (PCF)

# RECALLING PREVIOUS POLYGON TO THIS R CHUNK, OTHERWISE ADDING THE NEW SUBAREAS DIDN'T WORK
polyg <- readOGR(dsn="england", layer="england_historic")
## OGR data source with driver: ESRI Shapefile 
## Source: "C:\Users\Uine\Documents\R\SADE\week3\england", layer: "england_historic"
## with 1 features
## It has 3 fields
dem <- raster("dem_england/dem_england_historic.tif")
settl <- readOGR(dsn="nucleations", layer="Nucleations")
## OGR data source with driver: ESRI Shapefile 
## Source: "C:\Users\Uine\Documents\R\SADE\week3\nucleations", layer: "Nucleations"
## with 10513 features
## It has 3 fields
plot(polyg)
plot(dem, add=T, col=terrain.colors(10))
points(settl, pch=19, cex=0.1)
settl <- settl[polyg, ]
sp_settl <- as.ppp(coordinates(settl),as.owin(polyg))

# Loading two polygons describing the new study areas
south <- readOGR(dsn="southernengland", layer="eng_south")
## OGR data source with driver: ESRI Shapefile 
## Source: "C:\Users\Uine\Documents\R\SADE\week3\southernengland", layer: "eng_south"
## with 1 features
## It has 1 fields
mid <- readOGR(dsn="midlands", layer="midlands")
## OGR data source with driver: ESRI Shapefile 
## Source: "C:\Users\Uine\Documents\R\SADE\week3\midlands", layer: "midlands"
## with 1 features
## It has 1 fields
plot(south, border="red", add=T)
plot(mid, border="cyan", add=T)

# Creating objects that contain settlements that fall into these regions & creating spastat objects
south_settl <- settl[south, ]
sp_south_settl <- as.ppp(coordinates(south_settl),as.owin(south))
points(sp_south_settl, pch=19, cex=0.1, col="red")

mid_settl <- settl[mid, ]
sp_mid_settl <- as.ppp(coordinates(mid_settl),as.owin(mid))
points(sp_mid_settl, pch=19, cex=0.1, col="cyan")

# Plotting the point distributions
plot(south, main="Avon region")
plot(dem, add=T, col=terrain.colors(20))
points(sp_south_settl, pch=19, cex=0.5)

plot(mid, main="Midlands")
plot(dem, add=T, col=terrain.colors(20))
points(sp_mid_settl, pch=19, cex=0.5)

# Running Clark & Evants test 
# Visually we can see that there is clustering of settlements near the waterways in "South", so the "dispersed" (R > 1) result for both of them seems not to capture the full picture 
clarkevans.test(sp_south_settl, correction="none")
## 
##  Clark-Evans test
##  No edge correction
##  Z-test
## 
## data:  sp_south_settl
## R = 1.1343, p-value = 5.336e-12
## alternative hypothesis: two-sided
clarkevans.test(sp_mid_settl, correction="none")
## 
##  Clark-Evans test
##  No edge correction
##  Z-test
## 
## data:  sp_mid_settl
## R = 1.328, p-value < 2.2e-16
## alternative hypothesis: two-sided
# K function 
# "Pois" is the expected Poisson line, the others are how the data compares to it with a few different edge corrections
k_func_south <- Kest(sp_south_settl)
dev.new(device=pdf)
plot(k_func_south, xlim=c(0,5000), main="K-Function South close-up")

# L function (Same as K but Pois is straightened)
l_func_south <- Lest(sp_south_settl)
plot(l_func_south, xlim=c(0,5000))

# PCF (pair correlation function)
# Same principle as K, but different process ("donut rings")
# Expected Poisson is the green horizontal line
pc_func_south <- pcf(sp_south_settl)
plot(pc_func_south, xlim=c(0,5000))

Monte Carlo Simulation

# Monte Carlo Simulation
# The grey envelope depicts results of random point patterns simulations in the study area, the line shows where our data deviates from this area of random distribution 
pc_func_100_south <- envelope(sp_south_settl, pcf, nsim=99)
## Generating 99 simulations of CSR  ...
## 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40,
## 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80,
## 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98,  99.
## 
## Done.
plot(pc_func_100_south, xlim=c(0,20000), main="South: PCF with 99 MC Simulations")

# Refined simulation with 999 simulations, and top and bottom 25 % removed in case of statistical outliers
# REDUCED nsim to 99 for the GitHub site as it takes too long to calculate again and again when knitting the html page
pc_func_1000_south <- envelope(sp_south_settl, pcf, nsim=99,
nrank=25)
## Generating 99 simulations of CSR  ...
## 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40,
## 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80,
## 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98,  99.
## 
## Done.
plot(pc_func_1000_south, xlim=c(0,20000), main="South: PCF with 999 MC Simulations")

# Comparing to Midlands data: R value barely gets to random pattern envelope, settlement patterns seem to be not clustered
pc_func_1000_mid <- envelope(sp_mid_settl, pcf, nsim=99,
nrank=25)
## Generating 99 simulations of CSR  ...
## 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40,
## 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80,
## 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98,  99.
## 
## Done.
plot(pc_func_1000_mid, xlim=c(0,5000), main="Midlands: PCF with 999 MC Simulations")

Linear Regression

# Loading data: location of Oxford and Oxford pottery found
sites <- readOGR(dsn="pottery_sites", layer="pottery_sites")
## OGR data source with driver: ESRI Shapefile 
## Source: "C:\Users\Uine\Documents\R\SADE\week3\pottery_sites", layer: "pottery_sites"
## with 30 features
## It has 3 fields
oxford <- readOGR(dsn="oxford/oxford.shp", layer="oxford")
## OGR data source with driver: ESRI Shapefile 
## Source: "C:\Users\Uine\Documents\R\SADE\week3\oxford\oxford.shp", layer: "oxford"
## with 1 features
## It has 2 fields
## Integer64 fields read as strings:  cat Id
plot(polyg)
points(oxford, pch=15, cex=2)
points(sites)

# Inspecting site names
head(sites, n=30)
##               site_name oxpots id
## 1              Wroxeter   5.50  1
## 2            Droitwhich  10.75  2
## 3             Leicester   7.50  3
## 4            Gloucester  21.00  4
## 5              Caerwent  19.00  5
## 6              Gatcombe  21.00  6
## 7                  Bath  21.25  7
## 8           Cirencester  22.50  8
## 9             Alchester  22.50  9
## 10           Verulamium  17.25 10
## 11 Dorchester on Thames  22.50 11
## 12           Mildenhall  17.50 12
## 13           Silchester  19.50 13
## 14               London  18.25 14
## 15            Ilchester  12.00 15
## 16           Durrington  11.50 16
## 17            Salisbury   1.50 17
## 18           East Anton  14.75 18
## 19           Winchester   8.50 19
## 20           Clausentum   6.25 20
## 21           Dorchester   5.85 21
## 22          Portchester   9.00 22
## 23           Chichester   6.75 23
## 24           Canterbury  17.50 24
## 25          Richborough  17.50 25
## 26           Colchester   7.00 26
## 27              Caister   4.00 27
## 28             Norwhich   4.00 28
## 29             Pevensey   7.75 29
## 30     Brough on Humber   1.50 30
# Dependant variable = the proportion of Oxford pottery at each sites (oxpots)
# Independent variable = distance of the site from Oxford

# Calculating these distances:
# Step 1: combine coordinate data to single list 
coords <- rbind(coordinates(sites), coordinates(oxford))

# Step 2: distance matrix between sites
d_matrix <- as.matrix(dist(coords))

# Step 3: Adding "distox" as a column to "sites" data frame 
sites$distox <- d_matrix[31,1:30]

# Scatter plot
plot(sites$distox,sites$oxpots)

# Linear regression analysis (lm) and adding the result to the scatterplot
res <- lm(sites$oxpots ~ sites$distox)
abline(res, col="red")

# Summary of the linear regression analysis results
summary(res)
## 
## Call:
## lm(formula = sites$oxpots ~ sites$distox)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -12.9931  -4.0310  -0.6302   3.5874  10.8196 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   2.061e+01  2.261e+00   9.116 7.13e-10 ***
## sites$distox -7.490e-05  1.904e-05  -3.933 0.000503 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.693 on 28 degrees of freedom
## Multiple R-squared:  0.3559, Adjusted R-squared:  0.3329 
## F-statistic: 15.47 on 1 and 28 DF,  p-value: 0.0005027
# Residuals: adding new column to "sites"
sites$sr <- residuals(res) / summary(res)$sigma

# Taking a look at results with boxplot & scatterplot
boxplot(sites$sr)

plot(sites$distox,sites$sr)

# Mapping the residual values
plot(polyg)
points(oxford, pch=15, cex=2)
points(sites[sites$sr >=-3 & sites$sr <=-2,], pch=1,
cex=3, col="blue")
points(sites[sites$sr >=-2 & sites$sr <=-1,], pch=1,
cex=2, col="blue")
points(sites[sites$sr >=-1 & sites$sr <=0,], pch=1,
cex=1, col="blue")
points(sites[sites$sr >=0 & sites$sr <=1,], pch=1,
cex=1, col="red")
points(sites[sites$sr >=1 & sites$sr <=2,], pch=1,
cex=2, col="red")

4: Correspondence Analysis

Chi-squared Test

# Setting working directory
setwd("~/R/SADE/week4")

# Setting up a toy dataset
soils <- c("clay", "morainic", "peat")
monuments <- c(21, 7, 24)
pct_land <- c(0.45, 0.25, 0.3)
mydata <- cbind.data.frame(soils, monuments, pct_land)

mydata
##      soils monuments pct_land
## 1     clay        21     0.45
## 2 morainic         7     0.25
## 3     peat        24     0.30
# One-sample Chi-squared test
# Null hypothesis is that there is no significant relationship between the distribution of the monuments and the soil type areas they are found in
# Step 1: create column with the expected nr of monuments per soil area
mydata$expected <- mydata$pct_land * sum(mydata$monuments)
mydata
##      soils monuments pct_land expected
## 1     clay        21     0.45     23.4
## 2 morainic         7     0.25     13.0
## 3     peat        24     0.30     15.6
# Step 2: Chi-squared test
chisq.test(mydata$monuments, p=mydata$pct_land)
## 
##  Chi-squared test for given probabilities
## 
## data:  mydata$monuments
## X-squared = 7.5385, df = 2, p-value = 0.02307
# X-squared = 7.5385, df = 2, p-value = 0.02307
# -> We can reject the null hypothesis (of the distribution being random) with 97.7 % confidence

# Two-sample Chi-squared test
soils <- c("clay", "morainic", "peat")
houses <- c(6, 3, 19)
manu <- c(10, 3, 2)
ritual <- c(5, 1, 3)
pct_land <- c(0.45, 0.25, 0.3)
newdata <- cbind.data.frame(soils, houses, manu, ritual,
pct_land)

newdata
##      soils houses manu ritual pct_land
## 1     clay      6   10      5     0.45
## 2 morainic      3    3      1     0.25
## 3     peat     19    2      3     0.30
newdata2 <- data.frame(newdata$houses, newdata$manu,
newdata$ritual)

chisq.test(newdata2)
## 
##  Pearson's Chi-squared test
## 
## data:  newdata2
## X-squared = 12.919, df = 4, p-value = 0.01168
# X-squared = 12.919, df = 4, p-value = 0.01168
# p < 0.05, so we can reject null hypothesis

Correspondence Analysis with Roman Coins

# Setting up
# Note: had to replace row.names="hordes" to row.names=1 for the code to work
library(ca)
mydata <- read.csv(file="romanhoards.csv", header=TRUE, sep=",", row.names=1)
summary(mydata)
##     Antioch         Alexandria       Aquileia         Arles       
##  Min.   :  0.00   Min.   : 0.00   Min.   :  3.0   Min.   :  3.00  
##  1st Qu.:  0.25   1st Qu.: 0.00   1st Qu.:  4.5   1st Qu.:  5.00  
##  Median : 43.50   Median : 2.50   Median :  6.5   Median : 24.50  
##  Mean   : 89.50   Mean   : 9.00   Mean   : 32.0   Mean   : 70.83  
##  3rd Qu.:118.25   3rd Qu.:19.25   3rd Qu.: 16.0   3rd Qu.:116.00  
##  Max.   :321.00   Max.   :25.00   Max.   :153.0   Max.   :228.00  
##  Constantinopolis    Cyzicus          Heracleia          London  
##  Min.   :  0.00   Min.   :   1.00   Min.   :   0.0   Min.   : 1  
##  1st Qu.:  0.00   1st Qu.:   7.75   1st Qu.:   7.5   1st Qu.: 1  
##  Median :  0.50   Median :  22.50   Median :  21.5   Median :18  
##  Mean   : 44.67   Mean   : 208.00   Mean   : 213.8   Mean   :27  
##  3rd Qu.: 23.50   3rd Qu.:  89.75   3rd Qu.:  53.5   3rd Qu.:44  
##  Max.   :236.00   Max.   :1087.00   Max.   :1173.0   Max.   :77  
##     Lugdunum       Nicomedia           Rome            Ostia       
##  Min.   : 0.00   Min.   :  2.00   Min.   :  2.00   Min.   :0.0000  
##  1st Qu.: 0.00   1st Qu.:  5.25   1st Qu.: 14.00   1st Qu.:0.0000  
##  Median :21.00   Median : 12.50   Median : 28.00   Median :0.0000  
##  Mean   :31.67   Mean   : 92.00   Mean   : 57.67   Mean   :0.1667  
##  3rd Qu.:63.75   3rd Qu.: 76.75   3rd Qu.: 35.25   3rd Qu.:0.0000  
##  Max.   :77.00   Max.   :424.00   Max.   :241.00   Max.   :1.0000  
##      Siscia           Sirmium      Thessalonica       Ticinum     
##  Min.   :   9.00   Min.   : 0.0   Min.   :  10.0   Min.   :  4.0  
##  1st Qu.:  14.75   1st Qu.: 0.0   1st Qu.:  18.0   1st Qu.:  7.5  
##  Median :  30.00   Median : 1.0   Median :  21.5   Median : 18.5  
##  Mean   : 714.00   Mean   :14.5   Mean   : 311.5   Mean   :109.3  
##  3rd Qu.:  42.25   3rd Qu.: 2.0   3rd Qu.:  32.5   3rd Qu.: 73.0  
##  Max.   :4159.00   Max.   :83.0   Max.   :1763.0   Max.   :520.0  
##      Trier          uncertain         Total        
##  Min.   :  0.00   Min.   : 0.00   Min.   :   74.0  
##  1st Qu.:  0.25   1st Qu.: 0.00   1st Qu.:  328.8  
##  Median :105.00   Median : 5.50   Median :  723.5  
##  Mean   :175.33   Mean   :26.00   Mean   : 2227.0  
##  3rd Qu.:333.50   3rd Qu.:52.25   3rd Qu.:  975.0  
##  Max.   :467.00   Max.   :79.00   Max.   :10585.0
# Using t() to flip rows to columns
t.mydata <- t(mydata)
t.mydata
##                  Tavistock Sq. Chavannes Nagyteteny Bulgaria Jezzine Nebek
## Antioch                      0         0        129        1     321    86
## Alexandria                   0         0         24        0      25     5
## Aquileia                     7        19        153        3       6     4
## Arles                       41       141        228        3       8     4
## Constantinopolis             1         0        236        0      31     0
## Cyzicus                      4         1       1087       19     111    26
## Heracleia                    5         0       1173       15      62    28
## London                      47        77         35        1       1     1
## Lugdunum                    71        77         42        0       0     0
## Nicomedia                    5         2        424        6      96    19
## Rome                        26        30        241        2      37    10
## Ostia                        0         0          0        0       0     1
## Siscia                      31        46       4159        9      29    10
## Sirmium                      0         2         83        0       2     0
## Thessalonica                21        22       1763       10      36    17
## Ticinum                     22        90        520        4      15     5
## Trier                      375       467        209        1       0     0
## uncertain                   11        66         79        0       0     0
## Total                      667      1040      10585       74     780   216
# Removing the "total" row from the analysis data
t.mydata <- t.mydata[-c(19),]
summary(t.mydata)
##  Tavistock Sq.      Chavannes        Nagyteteny        Bulgaria     
##  Min.   :  0.00   Min.   :  0.00   Min.   :   0.0   Min.   : 0.000  
##  1st Qu.:  1.75   1st Qu.:  0.25   1st Qu.:  80.0   1st Qu.: 0.000  
##  Median :  9.00   Median : 20.50   Median : 218.5   Median : 1.500  
##  Mean   : 37.06   Mean   : 57.78   Mean   : 588.1   Mean   : 4.111  
##  3rd Qu.: 29.75   3rd Qu.: 74.25   3rd Qu.: 496.0   3rd Qu.: 5.500  
##  Max.   :375.00   Max.   :467.00   Max.   :4159.0   Max.   :19.000  
##     Jezzine           Nebek      
##  Min.   :  0.00   Min.   : 0.00  
##  1st Qu.:  1.25   1st Qu.: 0.25  
##  Median : 20.00   Median : 4.50  
##  Mean   : 43.33   Mean   :12.00  
##  3rd Qu.: 36.75   3rd Qu.:15.25  
##  Max.   :321.00   Max.   :86.00
# CA from the data, excluding the "uncertain" and "Total" columns
mydata.ca <- ca(mydata[,1:17])
summary(mydata.ca)
## 
## Principal inertias (eigenvalues):
## 
##  dim    value      %   cum%   scree plot               
##  1      0.560747  61.8  61.8  ***************          
##  2      0.325160  35.8  97.6  *********                
##  3      0.012822   1.4  99.0                           
##  4      0.005746   0.6  99.6                           
##  5      0.003557   0.4 100.0                           
##         -------- -----                                 
##  Total: 0.908032 100.0                                 
## 
## 
## Rows:
##     name   mass  qlt  inr     k=1 cor ctr    k=2 cor ctr  
## 1 | TvsS |   50  969  250 | -2093 960 388 |  206   9   6 |
## 2 | Chvn |   74  980  303 | -1904 973 477 |  155   6   5 |
## 3 | Ngyt |  796 1000   94 |   251 584  89 | -212 416 109 |
## 4 | Blgr |    6   77    4 |   235  76   1 |  -15   0   0 |
## 5 | Jzzn |   59  995  277 |   590  82  37 | 1972 914 706 |
## 6 | Nebk |   16  930   72 |   545  74   9 | 1850 855 172 |
## 
## Columns:
##      name   mass  qlt  inr     k=1 cor ctr    k=2 cor ctr  
## 1  | Antc |   41  999  300 |   669  67  32 | 2498 932 780 |
## 2  | Alxn |    4  986   15 |   581  99   2 | 1737 887  38 |
## 3  | Aqul |   15  186    1 |   -42  40   0 |  -80 146   0 |
## 4  | Arls |   32  780   38 |  -910 780  48 |   21   0   0 |
## 5  | Cnst |   20  727    5 |   375 699   5 |   75  28   0 |
## 6  | Cyzc |   95  900   16 |   370 882  23 |   53  18   1 |
## 7  | Hrcl |   97  933   15 |   353 863  22 | -100  70   3 |
## 8  | Lndn |   12  990   52 | -1936 980  82 |  195  10   1 |
## 9  | Lgdn |   14  983   65 | -2001 977 103 |  163   6   1 |
## 10 | Ncmd |   42  966   16 |   388 431  11 |  432 535  24 |
## 11 | Rome |   26  966    2 |   -90 107   0 |  256 859   5 |
## 12 | Osti |    0  184    5 |   728   9   0 | 3245 175   2 |
## 13 | Sisc |  324  983   68 |   285 429  47 | -324 554 105 |
## 14 | Srmm |    7  934    1 |   279 485   1 | -268 449   1 |
## 15 | Thss |  142  990   22 |   278 554  19 | -247 437  26 |
## 16 | Tcnm |   50  496    5 |  -152 266   2 | -141 230   3 |
## 17 | Trir |   80  997  375 | -2058 990 602 |  175   7   8 |
# The first two dimensions explain cumulatively 97.6 % of the variation
# So we can quite comfortably plot the data into 2-dimensional graph!
plot(mydata.ca)

# Plotting just the rows or just the columns with "what" argument
plot(mydata.ca, what=c("none","all"), main="Columns")

plot(mydata.ca, what=c("all","none"), main="Rows")

# Selecting which CA dimensions to plot to see it from different angles
plot(mydata.ca, dim=c(1,3))

Correspondence Analysis with Burial Data

# Reading data (had to tweak the row.names argument again - apparently it got translated to "ï..pottery" instead of "pottery")
baxter <- read.csv(file="baxter_burials.csv", header=TRUE, sep=",", row.names="ï..pottery")

# Exploring the data
structure(baxter)
##     a  b  c d e  f g  h  i j  k  l  m n  o  p
## 1   0  0  0 0 0  0 2  2  1 1  0  0  0 1  5  3
## 2   0  0  0 0 0  0 1  0  0 1  1  7  0 0  0  0
## 3   0  2  0 0 0  1 0  1  0 0  0  0  0 0  0  0
## 4   0  8  0 2 3  0 0  0  0 0  0  0  0 0  0  0
## 5   0  0  0 1 0  0 0  0  0 0  1  0  0 0  0  0
## 6   0  0  0 0 0  1 0  0  0 0  1  0  0 0  0  0
## 7   0  4  0 0 2  0 0  1  0 0  0  0  1 0  2  0
## 8   0  1  1 1 0  0 0  0  0 0  0  1  0 0  0  0
## 9   0  7  6 1 2  0 0  0  0 1  3  3  0 0  1  1
## 10  3  6  1 1 3  3 2  6  2 2 12  5  2 1  2  1
## 11  0  8  0 4 4  0 0  0  0 0  0  0  0 0  0  0
## 12  1  1  0 0 0  0 0  2  0 0  1  0  0 0  0  0
## 13  0  2 12 0 0  0 1  0  0 0  1  0  0 0  0  0
## 14  0  1  0 0 0  0 0  2  0 0  2  0  0 0  0  0
## 15  0  0  0 0 0  0 1  1  1 1  1  0  7 3  4  2
## 16  1  1  0 0 0  1 2  0  1 0  4  1  9 4  0  0
## 17  0  0  0 0 0  0 0  0  0 0  7  0  0 0  1  0
## 18  2  2  0 0 0  1 0  0  0 0  3  0  0 0  0  0
## 19  0  0  0 0 0  0 0  1  3 1  0  2  3 2  7  0
## 20  0  5  0 0 2  0 0  0  0 0  1  0  0 0  0  0
## 21  0  0  0 0 0  0 1  0  0 0  2  0  1 0  4  0
## 22  1  0  0 0 0  2 0  0  0 0  0  1  0 0  0  0
## 23  0  1  0 3 0  0 0  0  0 0  0  0  0 0  0  0
## 24  0  3  0 0 3  0 0  0  0 0  0  0  0 0  0  0
## 25  1  8 26 3 1  0 0  0  0 0  0  0  0 0  0  1
## 26  0  0  0 0 0  0 0  2  1 2  0  0 13 1  4  4
## 27  1  4  0 0 0  2 0  1  2 4 14  2  1 3  0  0
## 28  1  1 13 1 0  2 0  0  0 0  0  0  0 0  0  0
## 29  0  3 14 0 0  0 1  0  0 0  1  0  0 0  0  0
## 30  1  6  0 0 0  1 1  1  0 0  4  0  0 0  1  1
## 31 10 14  2 0 0 11 6 16  3 5 24 27  4 5  2  2
## 32  1  0  0 0 0  3 0  0  0 0  0  1  0 0  0  0
## 33  0  0  0 0 0  0 0  0  0 1  5  2  0 0  0  0
## 34  0  4  0 0 0  1 0  0  0 0  4  1  2 1  7  0
## 35  4  5  0 0 0  1 0  1  0 0  3  0  0 0  0  0
## 36  0  0  0 0 0  0 1  0  1 1  1  0  0 2  2  1
## 37  0  0  0 0 0  0 0  0  3 0  0  1  1 4  0  1
## 38  0  0  0 0 0  0 0  0  2 0  0  0  0 0  0  0
## 39  2 13  0 2 2  5 7  5  3 5 15  5  0 4  6  3
## 40  0  0  0 0 0  0 0  0  0 0  0  0  0 0  3  1
## 41  5  9  3 4 0  5 7  7 11 9 13 13 14 8 11 12
## 42  0  1  0 0 0  0 0  0  0 1  0  0  8 0  1  1
## 43  0  0  0 0 0  0 1  2  1 2  0  0  0 0  0  0
## 44  0  0  0 0 0  0 1  5  0 1  0  3  0 0  0  0
## 45  0  0  1 0 0  1 0  2  1 1  2  2  0 0  0  0
## 46  0  0  0 0 0  1 0  0  0 0  1  0  0 0  0  0
## 47  0  5  0 0 1  2 0  0  0 0  2  0  0 0  2  0
## 48  1  3  1 0 1  5 2  5  0 0  9  2  0 0  0  1
## 49  0  0  0 0 0  0 1  0  0 0  0  0  0 0  1  0
## 50  1  7  0 1 1  3 1  4  3 3  9  2  1 2  4  4
## 51  3 12  0 0 2  1 0  7  1 1  1  2  0 0  0  0
## 52  0  1  0 0 0  1 1  2  0 0  3  0  0 0  0  0
boxplot(baxter)

# CA for the data
# In the plots (drawn from the first three dimensions) burials are shown as red triangles and pottery types as blue dots
baxter.ca <- ca(baxter)
summary(baxter.ca)
## 
## Principal inertias (eigenvalues):
## 
##  dim    value      %   cum%   scree plot               
##  1      0.604014  27.7  27.7  *******                  
##  2      0.375853  17.2  44.9  ****                     
##  3      0.292531  13.4  58.4  ***                      
##  4      0.152558   7.0  65.4  **                       
##  5      0.151621   7.0  72.3  **                       
##  6      0.125624   5.8  78.1  *                        
##  7      0.105033   4.8  82.9  *                        
##  8      0.090072   4.1  87.0  *                        
##  9      0.082453   3.8  90.8  *                        
##  10     0.051150   2.3  93.1  *                        
##  11     0.042659   2.0  95.1                           
##  12     0.036629   1.7  96.8                           
##  13     0.027803   1.3  98.1                           
##  14     0.023567   1.1  99.1                           
##  15     0.018914   0.9 100.0                           
##         -------- -----                                 
##  Total: 2.180481 100.0                                 
## 
## 
## Rows:
##      name   mass  qlt  inr     k=1 cor ctr     k=2 cor ctr  
## 1  |    1 |   15  279   18 |   544 110   7 |  -674 169  18 |
## 2  |    2 |   10   34   25 |   380  26   2 |   214   8   1 |
## 3  |    3 |    4  245    5 |   -24   0   0 |   820 245   7 |
## 4  |    4 |   13  433   27 |  -477  49   5 |  1334 383  60 |
## 5  |    5 |    2   71   10 |  -337  10   0 |   830  61   4 |
## 6  |    6 |    2   72    5 |   260  12   0 |   569  60   2 |
## 7  |    7 |   10   94   11 |    26   0   0 |   484  94   6 |
## 8  |    8 |    4  320    7 | -1027 288   7 |   341  32   1 |
## 9  |    9 |   24  680   11 |  -777 625  24 |   231  55   3 |
## 10 |   10 |   51  588    5 |   168 130   2 |   314 458  13 |
## 11 |   11 |   16  376   41 |  -542  51   8 |  1366 324  78 |
## 12 |   12 |    5  166    6 |   182  12   0 |   650 154   5 |
## 13 |   13 |   16  953   46 | -2359 867 144 |  -745  86  23 |
## 14 |   14 |    5  166    6 |   222  20   0 |   606 146   5 |
## 15 |   15 |   21  805   21 |   666 203  15 | -1147 602  72 |
## 16 |   16 |   23  398   25 |   558 132  12 |  -792 266  39 |
## 17 |   17 |    8   49   16 |   367  31   2 |   284  18   2 |
## 18 |   18 |    8  216    8 |    89   3   0 |   702 213  10 |
## 19 |   19 |   19  495   21 |   646 166  13 |  -909 329  41 |
## 20 |   20 |    8  404   15 |  -296  21   1 |  1264 383  33 |
## 21 |   21 |    8  223   13 |   563  85   4 |  -715 138  11 |
## 22 |   22 |    4   63   11 |   223   8   0 |   582  55   4 |
## 23 |   23 |    4   86   42 |  -834  30   5 |  1150  57  14 |
## 24 |   24 |    6  272   27 |  -411  17   2 |  1613 255  41 |
## 25 |   25 |   39  994   89 | -2182 958 308 |  -427  37  19 |
## 26 |   26 |   26  628   45 |   705 134  22 | -1353 494 129 |
## 27 |   27 |   33  141   15 |   327 106   6 |   188  35   3 |
## 28 |   28 |   18  936   50 | -2314 873 156 |  -622  63  18 |
## 29 |   29 |   19  958   53 | -2335 880 168 |  -697  78  24 |
## 30 |   30 |   16  276    6 |    88   9   0 |   485 267  10 |
## 31 |   31 |  128  252   28 |   247 127  13 |   245 125  21 |
## 32 |   32 |    5   55   17 |   218   6   0 |   597  48   5 |
## 33 |   33 |    8   89   10 |   362  47   2 |   343  42   2 |
## 34 |   34 |   20  137   15 |   379  88   5 |  -280  48   4 |
## 35 |   35 |   14  234   16 |    22   0   0 |   764 234  21 |
## 36 |   36 |    9  334    9 |   560 137   5 |  -669 196  10 |
## 37 |   37 |   10  211   26 |   645  71   7 |  -903 140  21 |
## 38 |   38 |    2   42   22 |   641  17   1 |  -780  25   3 |
## 39 |   39 |   75  395    9 |   228 200   6 |   225 195  10 |
## 40 |   40 |    4  176   16 |   630  45   3 | -1079 132  12 |
## 41 |   41 |  128  638   19 |   320 314  22 |  -325 325  36 |
## 42 |   42 |   12  424   34 |   683  74   9 | -1482 350  69 |
## 43 |   43 |    6   51   12 |   469  48   2 |  -101   2   0 |
## 44 |   44 |   10   62   18 |   400  40   3 |   293  22   2 |
## 45 |   45 |   10   11    5 |    44   2   0 |   100   9   0 |
## 46 |   46 |    2   72    5 |   260  12   0 |   569  60   2 |
## 47 |   47 |   12  251    9 |    14   0   0 |   629 251  12 |
## 48 |   48 |   29  270   12 |   109  14   1 |   470 256  17 |
## 49 |   49 |    2   77    8 |   492  27   1 |  -672  50   2 |
## 50 |   50 |   45  336    5 |   279 300   6 |    96  36   1 |
## 51 |   51 |   29  432   19 |    17   0   0 |   779 432  47 |
## 52 |   52 |    8  187    6 |   243  37   1 |   490 150   5 |
## 
## Columns:
##      name   mass  qlt  inr     k=1 cor ctr     k=2 cor ctr  
## 1  |    a |   38  112   30 |    93   5   1 |   428 107  19 |
## 2  |    b |  145  552   60 |  -271  82  18 |   650 470 163 |
## 3  |    c |   78  984  237 | -2440 903 772 |  -729  81 111 |
## 4  |    d |   23  148   82 |  -774  79  23 |   723  69  33 |
## 5  |    e |   26  333   69 |  -368  24   6 |  1328 310 124 |
## 6  |    f |   53  103   43 |   153  13   2 |   402  90  23 |
## 7  |    g |   39   70   21 |   242  50   4 |  -154  20   2 |
## 8  |    h |   74  154   43 |   316  79  12 |   310  75  19 |
## 9  |    i |   39  191   45 |   498  99  16 |  -478  92  24 |
## 10 |    j |   42  206   19 |   407 172  12 |  -180  34   4 |
## 11 |    k |  148  181   56 |   251  76  15 |   295 105  34 |
## 12 |    l |   81   87   53 |   293  60  12 |   193  26   8 |
## 13 |    m |   66  569  107 |   665 124  48 | -1258 444 276 |
## 14 |    n |   40  345   35 |   542 155  19 |  -601 190  38 |
## 15 |    o |   68  306   74 |   522 115  31 |  -671 190  82 |
## 16 |    p |   38  372   26 |   391 103  10 |  -633 270  41 |
plot(baxter.ca)

plot(baxter.ca, dim=c(1,3))

plot(baxter.ca, dim=c(2,3))

# Removing row c as an outlier
baxter.ca <- ca(baxter[,-c(3)])
plot(baxter.ca)

5: Cluster & Principal Component Analysis

Hierarchical cluster analysis

# Setting up and renaming "ï..height" back to "height"
setwd("~/R/SADE/week5")
library(cluster)

peacock<-read.csv(file="peacock.csv", header=TRUE, sep=",")
colnames(peacock)[colnames(peacock) == 'ï..height'] <- 'height'

# Exploring the data 
# The "las=2" in boxplot flips the x axis labels so they don't overlap
summary(peacock)
##      height          diameter       hopper_slope   hopper_orifice 
##  Min.   : 39.00   Min.   : 35.00   Min.   :19.00   Min.   : 6.00  
##  1st Qu.: 64.00   1st Qu.: 66.50   1st Qu.:37.25   1st Qu.:15.50  
##  Median : 70.00   Median : 75.00   Median :41.00   Median :20.00  
##  Mean   : 70.79   Mean   : 72.93   Mean   :40.90   Mean   :19.74  
##  3rd Qu.: 81.25   3rd Qu.: 80.00   3rd Qu.:46.00   3rd Qu.:25.00  
##  Max.   :100.00   Max.   :102.00   Max.   :57.00   Max.   :30.00  
##    thickness      mount_protrusion  mount_width     mount_height  
##  Min.   : 0.000   Min.   :0.00     Min.   :15.00   Min.   :11.50  
##  1st Qu.: 2.500   1st Qu.:0.50     1st Qu.:25.25   1st Qu.:20.50  
##  Median : 3.250   Median :0.50     Median :30.00   Median :24.00  
##  Mean   : 4.631   Mean   :1.11     Mean   :29.17   Mean   :22.82  
##  3rd Qu.: 6.000   3rd Qu.:1.00     3rd Qu.:34.00   3rd Qu.:25.00  
##  Max.   :13.000   Max.   :5.00     Max.   :50.00   Max.   :30.00  
##  m_aper_height    m_aper_width   mount_mortice      girth_band   
##  Min.   : 5.00   Min.   : 6.00   Min.   : 3.000   Min.   :1.000  
##  1st Qu.:10.00   1st Qu.:11.12   1st Qu.: 5.000   1st Qu.:1.000  
##  Median :11.00   Median :12.00   Median : 5.500   Median :1.000  
##  Mean   :10.77   Mean   :11.89   Mean   : 5.714   Mean   :1.429  
##  3rd Qu.:12.00   3rd Qu.:12.50   3rd Qu.: 6.000   3rd Qu.:2.000  
##  Max.   :16.50   Max.   :17.00   Max.   :12.000   Max.   :2.000  
##      type          
##  Length:42         
##  Class :character  
##  Mode  :character  
##                    
##                    
## 
boxplot(peacock[-c(12:13)], las=2)

# Scaling the data
peacock.std<-scale(peacock[1:11])
peacock.std
##            height     diameter hopper_slope hopper_orifice   thickness
##  [1,] -0.71959107 -1.278452126   -0.8250442     0.39711929 -0.80081772
##  [2,] -0.31928875  0.156435070   -0.5860659     1.27496194 -0.19205131
##  [3,] -0.05242054 -0.221166823   -0.4665767    -0.48072335  0.87328991
##  [4,] -0.31928875 -0.749809475    0.7283149     0.92382488 -0.49643451
##  [5,] -0.25257170 -0.372207581   -0.3470876     0.04598223 -0.80081772
##  [6,]  0.34788178  0.382996206    0.8478041     0.57268782 -0.64862612
##  [7,]  0.28116472  0.005394313    0.4893366     0.57268782 -0.49643451
##  [8,] -2.05393213 -2.260217049   -1.9004467    -1.88527159  0.11233190
##  [9,]  0.14773062  0.231955449    0.6088257     1.80166753 -0.49643451
## [10,]  1.74893988  2.195485296    1.0867824     0.92382488  0.41671510
## [11,]  1.54878873  1.062679615    1.3257607    -1.00742894  0.72109831
## [12,]  0.14773062  0.685077721    0.6088257    -0.12958630  0.41671510
## [13,] -0.45272286 -0.749809475   -0.9445334     0.04598223 -0.34424291
## [14,]  0.08101356 -0.598768717   -0.3470876    -0.83186041 -0.49643451
## [15,] -1.98721507 -2.864380079   -2.4978925    -2.41197718  0.11233190
## [16,]  0.21444767  0.156435070   -0.1081092     0.57268782  0.11233190
## [17,] -0.18585465  1.062679615    0.4893366    -1.00742894  1.63424792
## [18,]  0.88161820  0.231955449    0.2503582    -0.48072335  2.24301434
## [19,]  0.01429651  0.156435070    0.4893366    -0.48072335 -0.19205131
## [20,] -0.85302518 -0.070126066   -0.1081092     0.04598223 -0.80081772
## [21,] -0.05242054  0.005394313   -0.1081092     1.09939341 -0.80081772
## [22,]  0.81490114  0.760598100    1.4452499    -0.83186041 -0.03985971
## [23,]  1.21520346  1.138199994    1.9232065     0.92382488 -0.80081772
## [24,]  0.88161820  0.911638858    0.7283149     0.92382488  0.11233190
## [25,]  0.21444767  0.005394313   -0.1081092    -1.35856600  1.93863113
## [26,] -0.38600581  1.138199994   -0.1081092     1.27496194  1.17767312
## [27,] -0.45272286 -0.221166823    0.1308691     0.22155076 -0.64862612
## [28,] -2.12064918 -2.637818943   -2.6173816    -2.23640865 -0.03985971
## [29,] -0.45272286  0.382996206    0.3698474    -0.83186041 -0.64862612
## [30,]  1.14848641  0.987159236    0.7283149    -0.48072335  1.02548151
## [31,]  0.21444767  0.534036964    0.3698474     0.04598223 -0.64862612
## [32,] -0.05242054 -0.523248338   -0.2275984    -0.30515483 -1.40958413
## [33,] -0.71959107 -0.598768717   -1.3030009     0.92382488 -1.25739253
## [34,] -0.31928875  0.231955449    0.3698474     0.92382488 -0.64862612
## [35,] -0.58615696 -0.598768717   -0.4665767    -0.83186041 -0.49643451
## [36,] -0.45272286  0.005394313   -0.2275984     1.27496194 -0.64862612
## [37,]  1.68222283  0.156435070    0.4893366    -1.18299747  2.24301434
## [38,] -1.85378097 -0.900850232   -1.7809575     0.92382488 -0.49643451
## [39,]  1.94909104  1.289240751    1.3257607     0.22155076  2.54739754
## [40,]  1.41535462  0.534036964   -0.1081092    -0.30515483  0.72109831
## [41,] -1.11989339 -0.221166823   -0.7055551     0.92382488 -0.64862612
## [42,]  0.81490114  0.458516585    1.0867824     0.22155076 -1.40958413
##       mount_protrusion mount_width mount_height m_aper_height m_aper_width
##  [1,]      -0.89881701 -2.04383044  -1.64571371     0.6244279   0.04973618
##  [2,]       0.31632187  0.12022532   0.04308151     0.6244279   0.04973618
##  [3,]      -0.49377072  0.12022532   0.28433797     0.6244279   0.04973618
##  [4,]      -0.49377072  0.69730686   0.76685089     0.3698068   0.28183836
##  [5,]      -0.49377072  0.26449570   0.28433797     0.1151857   0.51394054
##  [6,]      -0.49377072 -0.02404506   0.76685089     0.1151857   0.04973618
##  [7,]      -0.49377072  0.40876609   0.52559443     0.1151857   0.04973618
##  [8,]      -0.08872443 -1.75528967  -2.36948309    -2.1764040  -2.50338780
##  [9,]      -0.49377072 -0.45685622  -0.07754672     0.1151857  -0.18236600
## [10,]      -0.65578924  1.70719954   1.73187673     1.6429122   1.44234926
## [11,]      -0.49377072  0.98584762   1.49062027     0.6244279   0.97814490
## [12,]      -0.65578924  0.69730686   1.00810735     0.1151857   0.04973618
## [13,]       1.93650704 -0.74539698  -0.19817495    -0.9032986  -1.34287690
## [14,]      -0.49377072 -0.60112660   0.52559443    -0.3940564   0.04973618
## [15,]      -0.49377072 -1.89956005  -2.61073955    -2.1764040  -2.73548998
## [16,]      -0.08872443  0.69730686   0.28433797     0.1151857   0.04973618
## [17,]      -0.49377072  0.12022532   0.28433797     0.1151857   0.04973618
## [18,]      -0.89881701 -0.31258583  -0.19817495    -0.3940564   0.04973618
## [19,]      -0.49377072 -0.31258583   0.52559443    -0.3940564   0.04973618
## [20,]       0.72136816 -0.02404506   0.28433797    -0.6486775  -0.41446818
## [21,]      -0.49377072  0.26449570  -0.19817495    -0.3940564   0.04973618
## [22,]      -0.49377072  3.00563300   0.52559443    -0.3940564   0.04973618
## [23,]      -0.49377072  0.55303647  -0.68068787     0.8790490   0.74604272
## [24,]      -0.49377072  0.40876609   1.24936381     2.9160177   0.74604272
## [25,]      -0.89881701 -0.88966737  -1.04257256    -0.6486775  -0.87867254
## [26,]      -0.49377072  0.69730686   0.76685089     0.8790490   0.97814490
## [27,]      -0.08872443 -0.31258583  -0.19817495    -0.3940564  -0.41446818
## [28,]      -0.49377072 -2.04383044  -2.73136778    -2.9402673  -2.50338780
## [29,]      -0.08872443 -0.02404506   0.52559443     0.3698068  -0.41446818
## [30,]      -0.49377072  0.84157724  -0.19817495     0.1151857   0.04973618
## [31,]      -0.49377072 -0.02404506   1.00810735    -0.3940564   0.04973618
## [32,]       1.53146075 -1.32247852  -0.68068787    -1.4125408  -0.87867254
## [33,]       2.74659963 -0.60112660  -0.68068787    -0.3940564  -0.41446818
## [34,]      -0.08872443  0.55303647   0.52559443     0.8790490   0.97814490
## [35,]       2.74659963 -1.03393775  -0.68068787     0.1151857   0.04973618
## [36,]       0.72136816  0.26449570   0.28433797     0.8790490   0.04973618
## [37,]      -0.49377072  0.12022532   0.52559443     0.1151857   0.28183836
## [38,]      -0.08872443 -1.17820813  -0.19817495     0.6244279   0.04973618
## [39,]       0.72136816  0.84157724   1.00810735     0.3698068   2.37075798
## [40,]      -0.08872443  0.69730686   0.52559443     0.6244279   0.97814490
## [41,]       3.15164592  0.69730686  -0.68068787    -0.6486775  -0.41446818
## [42,]      -0.08872443  0.84157724  -0.68068787     0.6244279   1.90655362
##       mount_mortice
##  [1,]    -1.4581228
##  [2,]    -0.1151150
##  [3,]    -0.9209197
##  [4,]    -0.3837165
##  [5,]    -0.3837165
##  [6,]     0.1534866
##  [7,]    -0.1151150
##  [8,]    -1.4581228
##  [9,]    -0.3837165
## [10,]     0.1534866
## [11,]     0.4220882
## [12,]    -0.1151150
## [13,]    -0.3837165
## [14,]    -0.3837165
## [15,]    -1.4581228
## [16,]    -0.1151150
## [17,]     3.3767055
## [18,]     3.3767055
## [19,]    -0.3837165
## [20,]    -0.1151150
## [21,]    -0.1151150
## [22,]     0.1534866
## [23,]     0.4220882
## [24,]     0.1534866
## [25,]    -0.6523181
## [26,]     0.1534866
## [27,]    -0.9209197
## [28,]    -1.4581228
## [29,]    -0.6523181
## [30,]     0.4220882
## [31,]    -0.1151150
## [32,]    -0.3837165
## [33,]    -0.3837165
## [34,]     0.1534866
## [35,]     0.1534866
## [36,]    -0.1151150
## [37,]    -0.1151150
## [38,]     0.6906898
## [39,]     1.2278929
## [40,]     1.2278929
## [41,]    -0.3837165
## [42,]     1.2278929
## attr(,"scaled:center")
##           height         diameter     hopper_slope   hopper_orifice 
##        70.785714        72.928571        40.904762        19.738095 
##        thickness mount_protrusion      mount_width     mount_height 
##         4.630952         1.109524        29.166667        22.821429 
##    m_aper_height     m_aper_width    mount_mortice 
##        10.773810        11.892857         5.714286 
## attr(,"scaled:scale")
##           height         diameter     hopper_slope   hopper_orifice 
##        14.988672        13.241459         8.368960         5.695782 
##        thickness mount_protrusion      mount_width     mount_height 
##         3.285332         1.234427         6.931430         4.144967 
##    m_aper_height     m_aper_width    mount_mortice 
##         1.963702         2.154224         1.861493
boxplot(peacock.std, las=2)

# Summary statistics with agnes() command
agnes(peacock.std)
## Call:     agnes(x = peacock.std) 
## Agglomerative coefficient:  0.7512433 
## Order of objects:
##  [1]  1 38 13 32 35 33 41  2 36  4 34  9  5 16 21  6  7 19 31 12 14 20 27 29  3
## [26] 26 23 42 11 30 40 37 22 10 24 25 17 18 39  8 15 28
## Height (summary):
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.7851  1.5464  2.0448  2.4766  3.2775  7.7150 
## 
## Available components:
## [1] "order"  "height" "ac"     "merge"  "diss"   "call"   "method" "data"
# Cluster analysis (HCA): Single linkage
peacock.single <- hclust(dist(peacock.std, method="euclid"), method="single")
plot(peacock.single, main="Single Linkage")

# Outlining 5 clusters in the dendrogram
rect.hclust(peacock.single, k=5)

# Adding the assigned group as a column to the original data frame
peacock.single.5<-cutree(peacock.single,k=5)
peacock$single.5 <- peacock.single.5
peacock
##    height diameter hopper_slope hopper_orifice thickness mount_protrusion
## 1      60       56           34             22       2.0              0.0
## 2      66       75           36             27       4.0              1.5
## 3      70       70           37             17       7.5              0.5
## 4      66       63           47             25       3.0              0.5
## 5      67       68           38             20       2.0              0.5
## 6      76       78           48             23       2.5              0.5
## 7      75       73           45             23       3.0              0.5
## 8      40       43           25              9       5.0              1.0
## 9      73       76           46             30       3.0              0.5
## 10     97      102           50             25       6.0              0.3
## 11     94       87           52             14       7.0              0.5
## 12     73       82           46             19       6.0              0.3
## 13     64       63           33             20       3.5              3.5
## 14     72       65           38             15       3.0              0.5
## 15     41       35           20              6       5.0              0.5
## 16     74       75           40             23       5.0              1.0
## 17     68       87           45             14      10.0              0.5
## 18     84       76           43             17      12.0              0.0
## 19     71       75           45             17       4.0              0.5
## 20     58       72           40             20       2.0              2.0
## 21     70       73           40             26       2.0              0.5
## 22     83       83           53             15       4.5              0.5
## 23     89       88           57             25       2.0              0.5
## 24     84       85           47             25       5.0              0.5
## 25     74       73           40             12      11.0              0.0
## 26     65       88           40             27       8.5              0.5
## 27     64       70           42             21       2.5              1.0
## 28     39       38           19              7       4.5              0.5
## 29     64       78           44             15       2.5              1.0
## 30     88       86           47             17       8.0              0.5
## 31     74       80           44             20       2.5              0.5
## 32     70       66           39             18       0.0              3.0
## 33     60       65           30             25       0.5              4.5
## 34     66       76           44             25       2.5              1.0
## 35     62       65           37             15       3.0              4.5
## 36     64       73           39             27       2.5              2.0
## 37     96       75           45             13      12.0              0.5
## 38     43       61           26             25       3.0              1.0
## 39    100       90           52             21      13.0              2.0
## 40     92       80           40             18       7.0              1.0
## 41     54       70           35             25       2.5              5.0
## 42     83       79           50             21       0.0              1.0
##    mount_width mount_height m_aper_height m_aper_width mount_mortice girth_band
## 1           15         16.0          12.0         12.0           3.0          2
## 2           30         23.0          12.0         12.0           5.5          1
## 3           30         24.0          12.0         12.0           4.0          2
## 4           34         26.0          11.5         12.5           5.0          1
## 5           31         24.0          11.0         13.0           5.0          1
## 6           29         26.0          11.0         12.0           6.0          1
## 7           32         25.0          11.0         12.0           5.5          1
## 8           17         13.0           6.5          6.5           3.0          1
## 9           26         22.5          11.0         11.5           5.0          2
## 10          41         30.0          14.0         15.0           6.0          1
## 11          36         29.0          12.0         14.0           6.5          1
## 12          34         27.0          11.0         12.0           5.5          1
## 13          24         22.0           9.0          9.0           5.0          1
## 14          25         25.0          10.0         12.0           5.0          2
## 15          16         12.0           6.5          6.0           3.0          2
## 16          34         24.0          11.0         12.0           5.5          2
## 17          30         24.0          11.0         12.0          12.0          2
## 18          27         22.0          10.0         12.0          12.0          2
## 19          27         25.0          10.0         12.0           5.0          2
## 20          29         24.0           9.5         11.0           5.5          1
## 21          31         22.0          10.0         12.0           5.5          1
## 22          50         25.0          10.0         12.0           6.0          1
## 23          33         20.0          12.5         13.5           6.5          1
## 24          32         28.0          16.5         13.5           6.0          1
## 25          23         18.5           9.5         10.0           4.5          2
## 26          34         26.0          12.5         14.0           6.0          1
## 27          27         22.0          10.0         11.0           4.0          1
## 28          15         11.5           5.0          6.5           3.0          1
## 29          29         25.0          11.5         11.0           4.5          2
## 30          35         22.0          11.0         12.0           6.5          2
## 31          29         27.0          10.0         12.0           5.5          1
## 32          20         20.0           8.0         10.0           5.0          2
## 33          25         20.0          10.0         11.0           5.0          2
## 34          33         25.0          12.5         14.0           6.0          1
## 35          22         20.0          11.0         12.0           6.0          2
## 36          31         24.0          12.5         12.0           5.5          2
## 37          30         25.0          11.0         12.5           5.5          1
## 38          21         22.0          12.0         12.0           7.0          2
## 39          35         27.0          11.5         17.0           8.0          1
## 40          34         25.0          12.0         14.0           8.0          1
## 41          34         20.0           9.5         11.0           5.0          2
## 42          35         20.0          12.0         16.0           8.0          1
##    type single.5
## 1     B        1
## 2     C        2
## 3     C        2
## 4     C        2
## 5     C        2
## 6     C        2
## 7     C        2
## 8     A        3
## 9     C        2
## 10    D        2
## 11    D        2
## 12    C        2
## 13    B        2
## 14    C        2
## 15    A        3
## 16    C        2
## 17    D        4
## 18    D        4
## 19    C        2
## 20    C        2
## 21    C        2
## 22    D        2
## 23    D        2
## 24    D        2
## 25    C        2
## 26    D        2
## 27    C        2
## 28    A        3
## 29    C        2
## 30    D        2
## 31    C        2
## 32    B        2
## 33    B        2
## 34    C        2
## 35    B        2
## 36    C        2
## 37    D        2
## 38    B        2
## 39    E        5
## 40    E        2
## 41    C        2
## 42    E        2
# Cross-tabulating the new groups vector (groups 1-5) with the original "type" column (types A-E)
# In our experimental groups, a few outliers dominate the "typing" (group 2 has 35 mills!)
addmargins(table(peacock.single.5,peacock[,13]))
##                 
## peacock.single.5  A  B  C  D  E Sum
##              1    0  1  0  0  0   1
##              2    0  5 20  8  2  35
##              3    3  0  0  0  0   3
##              4    0  0  0  2  0   2
##              5    0  0  0  0  1   1
##              Sum  3  6 20 10  3  42
# Average method (HCA) 
peacock.ave<-
hclust(dist(peacock.std,method="euclid"),method="ave")

# Wards method (HCA) 
peacock.ward<-
hclust(dist(peacock.std,method="euclid"),method="ward")

# Plotting all three methods for comparison
plot(peacock.single,main="Single Linkage")
rect.hclust(peacock.single,k=5)

plot(peacock.ave,main="Average Linkage")
rect.hclust(peacock.ave,k=5)

plot(peacock.ward,main="Ward")
rect.hclust(peacock.ward,k=5)

# Examinimg the new groups with cross tabulation
peacock.ave.5<-cutree(peacock.ave,k=5)
peacock.ward.5<-cutree(peacock.ward,k=5)
addmargins(table(peacock.single.5,peacock[,13]))
##                 
## peacock.single.5  A  B  C  D  E Sum
##              1    0  1  0  0  0   1
##              2    0  5 20  8  2  35
##              3    3  0  0  0  0   3
##              4    0  0  0  2  0   2
##              5    0  0  0  0  1   1
##              Sum  3  6 20 10  3  42
addmargins(table(peacock.ave.5,peacock[,13]))
##              
## peacock.ave.5  A  B  C  D  E Sum
##           1    0  6  1  0  0   7
##           2    0  0 19  8  2  29
##           3    3  0  0  0  0   3
##           4    0  0  0  2  0   2
##           5    0  0  0  0  1   1
##           Sum  3  6 20 10  3  42
addmargins(table(peacock.ward.5,peacock[,13]))
##               
## peacock.ward.5  A  B  C  D  E Sum
##            1    0  2 19  0  0  21
##            2    3  0  0  0  0   3
##            3    0  0  0  8  3  11
##            4    0  4  1  0  0   5
##            5    0  0  0  2  0   2
##            Sum  3  6 20 10  3  42

K-Means Clustering

# This analysis for determining optimal amount of clusters is sensitive to the set seed
# But rerunning the analysis with a few different seeds shows that more often than not, 5 seems to be a good fit for this data
library(tidyverse)
set.seed(12)
wss <- function(k) {
kmeans(peacock.std, k)$tot.withinss
}
k.values <- 1:10
wss_values <- map_dbl(k.values, wss)
plot(k.values, wss_values,
type="b", pch = 19, frame = FALSE,
xlab="Number of clusters K",
ylab="Total within-clusters sum of squares")

# K-means partitioning
peacock.k5<-kmeans(peacock.std,centers=5)
peacock.k5
## K-means clustering with 5 clusters of sizes 9, 8, 5, 2, 18
## 
## Cluster means:
##       height   diameter hopper_slope hopper_orifice  thickness mount_protrusion
## 1  1.1114214  1.0542885  0.967293217      0.2605660  0.3828947       -0.2867471
## 2 -1.1365726 -1.3067723 -1.332873153     -0.7221301 -0.4964345        1.3795684
## 3  0.6280934  0.3225799  0.250358244     -0.7265193  1.2994264       -0.6071837
## 4  0.3478818  0.6473175  0.369847406     -0.7440761  1.9386311       -0.6962939
## 5 -0.2636912 -0.1078863 -0.001896653      0.4751498 -0.5471650       -0.2237399
##   mount_width mount_height m_aper_height m_aper_width mount_mortice
## 1  1.08202788   0.65962580    0.90734021   1.13287969     0.5713113
## 2 -1.08803914  -1.32906461   -1.31705787  -1.34287690    -0.7194685
## 3  0.17793347   0.11545845    0.06426151  -0.08952513    -0.2762759
## 4 -0.09618026   0.04308151   -0.13943536   0.04973618     3.3767055
## 5 -0.09618026   0.22402386    0.12933134   0.04973618    -0.2643381
## 
## Clustering vector:
##  [1] 5 5 3 5 5 5 5 2 5 1 1 3 2 5 2 5 4 4 5 5 5 1 1 1 3 1 5 2 5 3 5 2 2 5 2 5 3 5
## [39] 1 1 2 1
## 
## Within cluster sum of squares by cluster:
## [1] 50.908437 82.475658 14.881095  1.689131 52.154370
##  (between_SS / total_SS =  55.2 %)
## 
## Available components:
## 
## [1] "cluster"      "centers"      "totss"        "withinss"     "tot.withinss"
## [6] "betweenss"    "size"         "iter"         "ifault"
addmargins(table(peacock.k5$cluster, peacock[,14]))
##      
##        1  2  3  4  5 Sum
##   1    0  8  0  0  1   9
##   2    0  5  3  0  0   8
##   3    0  5  0  0  0   5
##   4    0  0  0  2  0   2
##   5    1 17  0  0  0  18
##   Sum  1 35  3  2  1  42
# Defining my own color vector for the plot, then plotting the pairs
mycols<-c("black","red","blue","green","cyan")
plot(peacock[1:11],col=mycols[peacock.k5$cluster],pch=19, cex=0.5)

# Trying again with 3 klusters
peacock.k3<-kmeans(peacock.std,centers=3)
plot(peacock[1:11],col=mycols[peacock.k3$cluster],pch=19, cex=0.3)

# Plotting the height and diameter pair in better resolution
plot(peacock[,1]~peacock[,2],pch=16,col=mycols[peacock.k3$cluster])

Principal Component Analysis

# Carrying out PCA
peacockPCA<-prcomp(peacock[,1:11],center=T,scale=T)

plot(peacockPCA)

peacockPCA$sdev^2
##  [1] 6.04208292 1.76178617 0.91801496 0.66495332 0.45364417 0.35802200
##  [7] 0.26308415 0.21807116 0.15130706 0.09460398 0.07443011
summary(peacockPCA)
## Importance of components:
##                           PC1    PC2     PC3     PC4     PC5     PC6     PC7
## Standard deviation     2.4581 1.3273 0.95813 0.81545 0.67353 0.59835 0.51292
## Proportion of Variance 0.5493 0.1602 0.08346 0.06045 0.04124 0.03255 0.02392
## Cumulative Proportion  0.5493 0.7094 0.79290 0.85335 0.89459 0.92714 0.95105
##                            PC8     PC9   PC10    PC11
## Standard deviation     0.46698 0.38898 0.3076 0.27282
## Proportion of Variance 0.01982 0.01376 0.0086 0.00677
## Cumulative Proportion  0.97088 0.98463 0.9932 1.00000
# Plotting the PCA
biplot(peacockPCA,main="PCA biplot of Roman mills")

# Plotting with Peacock’s groups instead of numbers
biplot(peacockPCA,xlabs=peacock[,13], main="PCA biplot of mill
data with Peacock groups")

# Plotting with K-means groups instead of numbers
biplot(peacockPCA,xlabs=peacock.k5$cluster)

# Using FactoMineR for PCA
library("FactoMineR")
peacockPCA_FTMR<-PCA(peacock[1:11])

# Diagnostics with FactoMineR
round(peacockPCA_FTMR$eig,2)
##         eigenvalue percentage of variance cumulative percentage of variance
## comp 1        6.04                  54.93                             54.93
## comp 2        1.76                  16.02                             70.94
## comp 3        0.92                   8.35                             79.29
## comp 4        0.66                   6.05                             85.33
## comp 5        0.45                   4.12                             89.46
## comp 6        0.36                   3.25                             92.71
## comp 7        0.26                   2.39                             95.11
## comp 8        0.22                   1.98                             97.09
## comp 9        0.15                   1.38                             98.46
## comp 10       0.09                   0.86                             99.32
## comp 11       0.07                   0.68                            100.00
dimdesc(peacockPCA_FTMR)
## $Dim.1
## $quanti
##                correlation      p.value
## diameter         0.9509599 5.431659e-22
## m_aper_width     0.9035309 2.623704e-16
## hopper_slope     0.8906081 2.868054e-15
## mount_height     0.8594395 3.201079e-13
## height           0.8586321 3.561697e-13
## mount_width      0.8270635 1.475047e-11
## m_aper_height    0.8041048 1.422684e-10
## mount_mortice    0.5873993 4.322354e-05
## hopper_orifice   0.4900349 9.872966e-04
## 
## attr(,"class")
## [1] "condes" "list"  
## 
## $Dim.2
## $quanti
##                  correlation      p.value
## hopper_orifice     0.7361042 2.773713e-08
## mount_protrusion   0.5285502 3.205013e-04
## mount_mortice     -0.3141322 4.276708e-02
## thickness         -0.8121858 6.637085e-11
## 
## attr(,"class")
## [1] "condes" "list"  
## 
## $Dim.3
## $quanti
##                  correlation      p.value
## mount_protrusion   0.7431472 1.735630e-08
## mount_mortice      0.5262374 3.442161e-04
## 
## attr(,"class")
## [1] "condes" "list"  
## 
## $call
## $call$num.var
## [1] 1
## 
## $call$proba
## [1] 0.05
## 
## $call$weights
##  [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## [39] 1 1 1 1
## 
## $call$X
##           Dim.1 height diameter hopper_slope hopper_orifice thickness
## 1  -2.392608517     60       56           34             22       2.0
## 2   0.191382920     66       75           36             27       4.0
## 3  -0.078525088     70       70           37             17       7.5
## 4   0.678491086     66       63           47             25       3.0
## 5  -0.073759424     67       68           38             20       2.0
## 6   1.026468957     76       78           48             23       2.5
## 7   0.738676447     75       73           45             23       3.0
## 8  -6.107597146     40       43           25              9       5.0
## 9   0.412469316     73       76           46             30       3.0
## 10  4.486798042     97      102           50             25       6.0
## 11  2.915117831     94       87           52             14       7.0
## 12  1.252522191     73       82           46             19       6.0
## 13 -2.231543247     64       63           33             20       3.5
## 14 -0.736273084     72       65           38             15       3.0
## 15 -6.828208755     41       35           20              6       5.0
## 16  0.601644379     74       75           40             23       5.0
## 17  1.582376072     68       87           45             14      10.0
## 18  1.277186435     84       76           43             17      12.0
## 19  0.047794055     71       75           45             17       4.0
## 20 -0.827509978     58       72           40             20       2.0
## 21 -0.002046682     70       73           40             26       2.0
## 22  2.125103519     83       83           53             15       4.5
## 23  2.337175373     89       88           57             25       2.0
## 24  3.043995150     84       85           47             25       5.0
## 25 -1.291609041     74       73           40             12      11.0
## 26  1.914466246     65       88           40             27       8.5
## 27 -0.906433877     64       70           42             21       2.5
## 28 -7.071496267     39       38           19              7       4.5
## 29 -0.122516269     64       78           44             15       2.5
## 30  1.506544425     88       86           47             17       8.0
## 31  0.608690174     74       80           44             20       2.5
## 32 -2.260204496     70       66           39             18       0.0
## 33 -2.014192118     60       65           30             25       0.5
## 34  1.297895774     66       76           44             25       2.5
## 35 -1.600435011     62       65           37             15       3.0
## 36  0.342581203     64       73           39             27       2.5
## 37  1.251937992     96       75           45             13      12.0
## 38 -1.604329350     43       61           26             25       3.0
## 39  3.902996798    100       90           52             21      13.0
## 40  1.993368175     92       80           40             18       7.0
## 41 -1.397278369     54       70           35             25       2.5
## 42  2.010884159     83       79           50             21       0.0
##    mount_protrusion mount_width mount_height m_aper_height m_aper_width
## 1               0.0          15         16.0          12.0         12.0
## 2               1.5          30         23.0          12.0         12.0
## 3               0.5          30         24.0          12.0         12.0
## 4               0.5          34         26.0          11.5         12.5
## 5               0.5          31         24.0          11.0         13.0
## 6               0.5          29         26.0          11.0         12.0
## 7               0.5          32         25.0          11.0         12.0
## 8               1.0          17         13.0           6.5          6.5
## 9               0.5          26         22.5          11.0         11.5
## 10              0.3          41         30.0          14.0         15.0
## 11              0.5          36         29.0          12.0         14.0
## 12              0.3          34         27.0          11.0         12.0
## 13              3.5          24         22.0           9.0          9.0
## 14              0.5          25         25.0          10.0         12.0
## 15              0.5          16         12.0           6.5          6.0
## 16              1.0          34         24.0          11.0         12.0
## 17              0.5          30         24.0          11.0         12.0
## 18              0.0          27         22.0          10.0         12.0
## 19              0.5          27         25.0          10.0         12.0
## 20              2.0          29         24.0           9.5         11.0
## 21              0.5          31         22.0          10.0         12.0
## 22              0.5          50         25.0          10.0         12.0
## 23              0.5          33         20.0          12.5         13.5
## 24              0.5          32         28.0          16.5         13.5
## 25              0.0          23         18.5           9.5         10.0
## 26              0.5          34         26.0          12.5         14.0
## 27              1.0          27         22.0          10.0         11.0
## 28              0.5          15         11.5           5.0          6.5
## 29              1.0          29         25.0          11.5         11.0
## 30              0.5          35         22.0          11.0         12.0
## 31              0.5          29         27.0          10.0         12.0
## 32              3.0          20         20.0           8.0         10.0
## 33              4.5          25         20.0          10.0         11.0
## 34              1.0          33         25.0          12.5         14.0
## 35              4.5          22         20.0          11.0         12.0
## 36              2.0          31         24.0          12.5         12.0
## 37              0.5          30         25.0          11.0         12.5
## 38              1.0          21         22.0          12.0         12.0
## 39              2.0          35         27.0          11.5         17.0
## 40              1.0          34         25.0          12.0         14.0
## 41              5.0          34         20.0           9.5         11.0
## 42              1.0          35         20.0          12.0         16.0
##    mount_mortice
## 1            3.0
## 2            5.5
## 3            4.0
## 4            5.0
## 5            5.0
## 6            6.0
## 7            5.5
## 8            3.0
## 9            5.0
## 10           6.0
## 11           6.5
## 12           5.5
## 13           5.0
## 14           5.0
## 15           3.0
## 16           5.5
## 17          12.0
## 18          12.0
## 19           5.0
## 20           5.5
## 21           5.5
## 22           6.0
## 23           6.5
## 24           6.0
## 25           4.5
## 26           6.0
## 27           4.0
## 28           3.0
## 29           4.5
## 30           6.5
## 31           5.5
## 32           5.0
## 33           5.0
## 34           6.0
## 35           6.0
## 36           5.5
## 37           5.5
## 38           7.0
## 39           8.0
## 40           8.0
## 41           5.0
## 42           8.0
round(peacockPCA_FTMR$var$cor,2)
##                  Dim.1 Dim.2 Dim.3 Dim.4 Dim.5
## height            0.86 -0.26  0.00 -0.20  0.03
## diameter          0.95 -0.02  0.07 -0.08  0.01
## hopper_slope      0.89 -0.05 -0.11 -0.26 -0.20
## hopper_orifice    0.49  0.74 -0.03  0.30  0.01
## thickness         0.29 -0.81  0.20  0.17  0.38
## mount_protrusion -0.23  0.53  0.74 -0.25  0.20
## mount_width       0.83  0.04 -0.01 -0.35 -0.03
## mount_height      0.86  0.10 -0.08 -0.09  0.19
## m_aper_height     0.80  0.28 -0.15  0.33  0.20
## m_aper_width      0.90  0.14  0.05  0.13  0.01
## mount_mortice     0.59 -0.31  0.53  0.33 -0.39
round(peacockPCA_FTMR$var$cos,2)
##                  Dim.1 Dim.2 Dim.3 Dim.4 Dim.5
## height            0.74  0.07  0.00  0.04  0.00
## diameter          0.90  0.00  0.00  0.01  0.00
## hopper_slope      0.79  0.00  0.01  0.07  0.04
## hopper_orifice    0.24  0.54  0.00  0.09  0.00
## thickness         0.08  0.66  0.04  0.03  0.14
## mount_protrusion  0.05  0.28  0.55  0.06  0.04
## mount_width       0.68  0.00  0.00  0.12  0.00
## mount_height      0.74  0.01  0.01  0.01  0.04
## m_aper_height     0.65  0.08  0.02  0.11  0.04
## m_aper_width      0.82  0.02  0.00  0.02  0.00
## mount_mortice     0.35  0.10  0.28  0.11  0.15
# Colour plotting 
# Had to tweak mycols to "as.factor" instead of "as.numeric" in cols2, as.numeric returned only NA:s
# Plotting with these color schemes doesn't work in this version of R
cols2 <- mycols[as.factor(peacock$type)]
cols2
##  [1] "red"   "blue"  "blue"  "blue"  "blue"  "blue"  "blue"  "black" "blue" 
## [10] "green" "green" "blue"  "red"   "blue"  "black" "blue"  "green" "green"
## [19] "blue"  "blue"  "blue"  "green" "green" "green" "blue"  "green" "blue" 
## [28] "black" "blue"  "green" "blue"  "red"   "red"   "blue"  "red"   "blue" 
## [37] "green" "red"   "cyan"  "cyan"  "blue"  "cyan"
## plot(peacockPCA_FTMR, choix ='ind', col.ind=cols2, title="PCA and Peacock's Mills")

cols3<-mycols[as.numeric(peacock.k5$cluster)]
cols3
##  [1] "cyan"  "cyan"  "blue"  "cyan"  "cyan"  "cyan"  "cyan"  "red"   "cyan" 
## [10] "black" "black" "blue"  "red"   "cyan"  "red"   "cyan"  "green" "green"
## [19] "cyan"  "cyan"  "cyan"  "black" "black" "black" "blue"  "black" "cyan" 
## [28] "red"   "cyan"  "blue"  "cyan"  "red"   "red"   "cyan"  "red"   "cyan" 
## [37] "blue"  "cyan"  "black" "black" "red"   "black"
## plot(peacockPCA_FTMR, choix ='ind', col.ind=cols3, title="PCA and Peacock's Mills")

6: Aoristic Analysis

Simple Aoristic Analysis

# Setting up
setwd("~/R/SADE/week6")
library(devtools)

# Downloading the archSeries package
devtools::install_github("davidcorton/archSeries")
library(archSeries)

# Data
mydata <- read.csv(file="mydata.csv", header=TRUE, sep=",")

head(mydata)
##         objecttype broadperio fromdate todate  old_findID
## 1      END SCRAPER  NEOLITHIC    -3500   -800 WILT-7FC4B2
## 2      END SCRAPER  NEOLITHIC    -2500  -2100 WILT-7FA0D7
## 3      END SCRAPER  NEOLITHIC    -2500  -2100 WILT-7F7A04
## 4              AWL BRONZE AGE    -2150   -800 WILT-77CB78
## 5           ARMLET    UNKNOWN    -1200    410 WILT-BB14B5
## 6 SOCKETED AXEHEAD BRONZE AGE    -1150   -800 WILT-CB58F4
nrow(mydata)
## [1] 1859
# Renaming columns to comply with archSeries
names(mydata)[names(mydata)=="fromdate"] <- "Start"
names(mydata)[names(mydata)=="todate"] <- "End"

# Coercing NA:s to zeros, then removing all the rows in which Start or End date is zero
mydata[is.na(mydata)] <- 0
mydata<-subset(mydata, Start!="0")
mydata<-subset(mydata, End!="0")

# Inspecting for typos in Start and End dates
mydata$subtract <- (mydata$End - mydata$Start)
mydata <- mydata[order(mydata$subtract),]
head(mydata)
##      objecttype broadperio Start  End  old_findID subtract
## 1859       COIN   MEDIEVAL 12222 1236 WILT-9105D6   -10986
## 1348     BROOCH   IRON AGE   400  200 WILT-89BBAC     -200
## 1655  STRAP END   MEDIEVAL  1300 1250 WILT-1A4B16      -50
## 160        COIN   IRON AGE    58   43 WILT-6F6F27      -15
## 161        COIN   IRON AGE    58   43 WILT-6F2D79      -15
## 1003       COIN      ROMAN   347  340   BM-B6331D       -7
# Deleting the typoed rows with earlier End date than Start date 
mydata <- mydata[!mydata$subtract < 0, ]

# Simplifying the data by selecting only the vectors we need for the analysis
mydata <- mydata[,c("objecttype", "Start","End")]

# Aoristic weighing + plot
aorist<-aorist(mydata, start.date=1000, end.date=1600, bin.width=20)
aorist.plot(aorist, opacity=80, ylab="Aoristic Sum")

summary(mydata)
##   objecttype            Start              End         
##  Length:1844        Min.   :-3500.0   Min.   :-2100.0  
##  Class :character   1st Qu.:  275.0   1st Qu.:  326.0  
##  Mode  :character   Median :  337.0   Median :  361.0  
##                     Mean   :  533.4   Mean   :  626.4  
##                     3rd Qu.:  911.5   3rd Qu.: 1200.0  
##                     Max.   : 1800.0   Max.   : 1900.0
# Dividing data into "coin" and artifacts
coin <- mydata[mydata$objecttype == "COIN", ]
notcoin <- mydata[mydata$objecttype != "COIN", ]

# Aoristic weighing + plot for coins
aorist_coin<-aorist(coin, start.date=1000, end.date=1600, bin.width=20)
aorist.plot(aorist_coin, opacity=80, ylab="Aoristic Sum for Coin")

# Aoristic weighing + plot for artifacts
aorist_notcoin<-aorist(notcoin, start.date=1000, end.date=1600, bin.width=20)
aorist.plot(aorist_notcoin, opacity=80, ylab="Aoristic Sum for Coin")

Monte Carlo Simulation

# Creating unique ID row for the analysis
mydata$ID <- c(1:nrow(mydata))

# Simulating distributions
mydata_sim<-date.simulate(mydata, start.date=1000, end.date=1600, bin.width=20, reps=1000)

# Plotting the results
lines.chron(mydata_sim)

poly.chron(mydata_sim)

box.chron(mydata_sim)

# Same for coin and artifacts
coin$ID <- c(1:nrow(coin))
notcoin$ID <- c(1:nrow(notcoin))

# coin
coin_sim<-date.simulate(coin, start.date=1000, end.date=1600, bin.width=20, reps=1000)

lines.chron(coin_sim)

poly.chron(coin_sim)

box.chron(coin_sim)

# notcoin
notcoin_sim<-date.simulate(notcoin, start.date=1000, end.date=1600, bin.width=20, reps=1000)

lines.chron(notcoin_sim)

poly.chron(notcoin_sim)

box.chron(notcoin_sim)

Beta Distribution

# Creating data
data = seq(0,1, length=100)

# Applying a probability distribution function to it with different parameters of α and β:
# α = 1 and β = 1 (each bin is given equal importance)
plot(data, dbeta(data, 1, 1), ylab="density", type ="l", col="black")

# α = 2 and β = 2 (prioritizes the middle at the expense of edge bins)
plot(data, dbeta(data, 2, 2), ylab="density", type ="l", col="red")

# Applying to our earlier data with α = 2 and β = 2
mydata_sim_2<-date.simulate(mydata, start.date=1000, end.date=1600, bin.width=20, reps=1000, a=2, b=2)

lines.chron(mydata_sim_2)

poly.chron(mydata_sim_2)

box.chron(mydata_sim_2)

Final assignment

Task 1

Using Dataset 1, a) plot medieval settlements in the county of Kent, and b) run K, L and Pair Correlation Function analysis with 99 runs of Monte Carlo simulation on the data.

# Setting up WD and necessary libraries
setwd("~/R/SADE/week7")
library(rgdal)   
library(raster)   
library(spatstat) 
library(maptools)
library(GISTools)  


# Loading and plotting the polygon "kent"
polyg <- readOGR(dsn="kent")
## OGR data source with driver: ESRI Shapefile 
## Source: "C:\Users\Uine\Documents\R\SADE\week7\kent", layer: "kent"
## with 1 features
## It has 1 fields
plot(polyg)

# Adding Digital Elevation Model to the plot to visualize topography
dem <- raster("dem_england/dem_england_historic.tif")
plot(dem, add=T, col=terrain.colors(15))

# Adding the medieval settlements as points to the plot
sett <- readOGR(dsn="dataset1")
## OGR data source with driver: ESRI Shapefile 
## Source: "C:\Users\Uine\Documents\R\SADE\week7\dataset1", layer: "dataset1"
## with 294 features
## It has 5 fields
## Integer64 fields read as strings:  value_1086
points(sett, pch=19, cex=0.2)

# Omitting points that might fall outside of the polygon 
sett <- sett[polyg, ]

# Inspecting how "dataset1" looks like
head(sett)
##   OSREFS Min_Eastin Min_Northi    vill_name value_1086
## 1 TQ4454     544000     154000    Westerham         40
## 2 TQ4655     546000     155000      Brasted         17
## 3 TQ4854     548000     154000    Sundridge         18
## 4 TQ5259     552000     159000       Otford         60
## 5 TQ5264     552000     164000 Lullingstone          5
## 6 TQ5465     554000     165000     Eynsford         20
summary(sett)
## Object of class SpatialPointsDataFrame
## Coordinates:
##              min      max
## coords.x1 544000 637000.0
## coords.x2 118000 176440.7
## Is projected: TRUE 
## proj4string :
## [+proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000
## +y_0=-100000 +ellps=airy +units=m +no_defs]
## Number of points: 294
## Data attributes:
##     OSREFS            Min_Eastin       Min_Northi      vill_name        
##  Length:294         Min.   :544000   Min.   :118000   Length:294        
##  Class :character   1st Qu.:574000   1st Qu.:146000   Class :character  
##  Mode  :character   Median :599500   Median :153500   Mode  :character  
##                     Mean   :595755   Mean   :152724                     
##                     3rd Qu.:614000   3rd Qu.:160000                     
##                     Max.   :637000   Max.   :177000                     
##   value_1086       
##  Length:294        
##  Class :character  
##  Mode  :character  
##                    
##                    
## 
# Converting the data into spastat object for analysis
sp_sett <- as.ppp(coordinates(sett),as.owin(polyg))

# K-Function analysis
# "Pois" is the expected Poisson line, the others are how the data compares to it with a few different edge corrections
k_fun <- Kest(sp_sett)
plot(k_fun, xlim=c(0,5000), main="K-Function analysis")

# L-Function analysis
# Same as K but "Pois" is straightened to a straight line
l_fun <- Lest(sp_sett)
plot(l_fun, xlim=c(0,5000), main="L-Function analysis")

# Pair Correlation Function analysis (PCF)
# Same principle as K, but different process ("donut rings")
# Expected Poisson is the green horizontal line
pc_fun <- pcf(sp_sett)
plot(pc_fun, xlim=c(0,5000), main="Pair Correlation Function analysis")

# Monte Carlo Simulation
pc_fun_100 <- envelope(sp_sett, pcf, nsim=99)
## Generating 99 simulations of CSR  ...
## 1, 2,  [etd 5:40] 3,  [etd 5:14] 4,
##  [etd 6:07] 5,  [etd 5:54] 6,  [etd 5:49] 7,  [etd 5:37] 8,
##  [etd 5:46] 9,  [etd 5:37] 10,  [etd 5:38] 11,  [etd 5:37] 12,
##  [etd 5:27] 13,  [etd 5:23] 14,  [etd 5:21] 15,  [etd 5:15] 16,
##  [etd 5:10] 17,  [etd 5:04] 18,  [etd 5:04] 19,  [etd 5:05] 20,
##  [etd 5:00] 21,  [etd 4:57] 22,  [etd 4:53] 23,  [etd 4:52] 24,
##  [etd 4:50] 25,  [etd 4:47] 26,  [etd 4:45] 27,  [etd 4:41] 28,
##  [etd 4:39] 29,  [etd 4:37] 30,  [etd 4:36] 31,  [etd 4:31] 32,
##  [etd 4:29] 33,  [etd 4:24] 34,  [etd 4:20] 35,  [etd 4:16] 36,
##  [etd 4:11] 37,  [etd 4:09] 38,  [etd 4:06] 39,  [etd 4:02] 40,
##  [etd 3:56] 41,  [etd 3:51] 42,  [etd 3:48] 43,  [etd 3:44] 44,
##  [etd 3:39] 45,  [etd 3:35] 46,  [etd 3:31] 47,  [etd 3:26] 48,
##  [etd 3:22] 49,  [etd 3:18] 50,  [etd 3:14] 51,  [etd 3:10] 52,
##  [etd 3:06] 53,  [etd 3:01] 54,  [etd 2:58] 55,  [etd 2:54] 56,
##  [etd 2:50] 57,  [etd 2:46] 58,  [etd 2:41] 59,  [etd 2:37] 60,
##  [etd 2:33] 61,  [etd 2:29] 62,  [etd 2:25] 63,  [etd 2:21] 64,
##  [etd 2:16] 65,  [etd 2:12] 66,  [etd 2:09] 67,  [etd 2:05] 68,
##  [etd 2:01] 69,  [etd 1:57] 70,  [etd 1:53] 71,  [etd 1:49] 72,
##  [etd 1:45] 73,  [etd 1:41] 74,  [etd 1:37] 75,  [etd 1:33] 76,
##  [etd 1:29] 77,  [etd 1:25] 78,  [etd 1:21] 79,  [etd 1:18] 80,
##  [etd 1:14] 81,  [etd 1:10] 82,  [etd 1:06] 83,  [etd 1:02] 84,
##  [etd 58 sec] 85,  [etd 54 sec] 86,  [etd 51 sec] 87,  [etd 47 sec] 88,
##  [etd 43 sec] 89,  [etd 39 sec] 90,  [etd 35 sec] 91,  [etd 31 sec] 92,
##  [etd 27 sec] 93,  [etd 23 sec] 94,  [etd 19 sec] 95,  [etd 16 sec] 96,
##  [etd 12 sec] 97,  [etd 8 sec] 98,  [etd 4 sec]  99.
## 
## Done.
plot(pc_fun_100, xlim=c(0,20000), main="South: PCF with 99 MC Simulations")

Task 2

Using Dataset 2, a) plot the distribution of medieval object findspots in the county of Kent. b) Perform kernel density analysis on the findspots, using a few different search radii. c) Also perform a relative risk surface analysis, using the object type “seal matrix”.

Task 3

  1. Perform aoristic analysis on Dataset 3, both on the whole dataset and on just the seal matrices.
  2. Include the graphs for aoristic weighing (bar charts) and
  3. for aoristic analysis with MC simulation (your choice of line, polygon “blocks colour” or boxplot charts), both for the overall and the seal matrices analyses.

Task 4

Perform PCA on dataset 4.

Task 5

Perform cluster analysis on dataset 4.